Session Title | Speaker | Type | Materials | Yr | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Open Architecture Tradeoffs (OAT): A simple, computational game engine for rapidly exploring hypotheses in Battle Management Command and Control (BMC2) (Abstract)
We created the Open Architectures Tradeoff (OAT) tool, a simple, computational game engine for rapidly exploring hypotheses about mission effectiveness in Battle Management Command and Control (BMC2). Each run of an OAT game simulates a military mission in contested airspace. Game objects represent U.S., adversary, and allied assets, each of which moves through the simulated airspace. Each U.S. asset has a Command and Control (C2) package the controls its actions—currently, neural networks form the basis of each U.S. asset’s C2 package. The weights of the neural network are randomized at the beginning of each game and are updated over the course of the game as the U.S. asset learns which of its actions lead to rewards, e.g., intercepting an adversary. Weights are updated via a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) altered to accommodate a Reinforcement Learning paradigm. OAT allows a user to winnow down the trade space that should be considered when setting up more expensive and time-consuming campaign models. OAT could be used to weed out bad ideas for “fast failure”, thus avoiding waste of campaign modeling resources. Questions can be explored via OAT such as: Which combination of system capabilities is likely to be more or less effective in a particular military mission? For example, in an early analysis, OAT was used to test the hypothesis that increases in U.S. assets’ sensor range always lead to increases in mission effectiveness, quantified as the percent of adversaries intercepted. We ran over 2500 OAT games, each time varying the sensor range of U.S. assets and the density of adversary assets. Results show that increasing sensor range did lead to an increase in military effectiveness—but only up to a certain point. Once the sensor range surpassed approximately 10-15% of the simulated airspace size, no further gains were made in the percent of adversaries intercepted. Thus, campaign modelers should hesitate to devote resources to exploring sensor range in isolation. More recent OAT analyses are exploring more complex hypotheses regarding the trade space between sensor range and communications range. |
Shelley Cazares | Breakout |
![]() | 2019 | ||||||||
Comparison of Methods for Testing Uniformity to Support the Validation of Simulation Models used for Live-Fire Testing (Abstract)
Goodness-of-fit (GOF) testing is used in many applications, including statistical hypothesis testing to determine if a set of data come from a hypothesized distribution. In addition, combined probability tests are extensively used in meta-analysis to combine results from several independent tests to asses an overall null hypothesis. This paper summarizes a study conducted to determine which GOF and/or combined probability test(s) can be used to determine if a set of data with relative small sample size comes from the standard uniform distribution, U(0,1). The power against different alternative hypothesis of several GOF tests and combined probability methods were examined. The GOF methods included: Anderson-Darling, Chi-Square, Kolmogorov-Smirnov, Cramér-Von Mises, Neyman-Barton, Dudewicz-van der Meulen, Sherman, Quesenberry-Miller, Frosini, and Hegazy-Green; while thecombined probability test methods included: Fisher’s Combined Probability Test, Mean Z, Mean P, Maximum P, Minimum P, Logit P, and Sum Z. While no one method was determined to provide the best power in all situations, several useful methods to support model validation were identified. |
Shannon Shelburne | Breakout |
![]() | 2019 | ||||||||
Valuing Human Systems Integration: A Test and Data Perspective (Abstract)
Technology advances are accelerating at a rapid pace, with the potential to enable greater capability and power to the Warfighter. However, if human capabilities and limitations are not central to concepts, requirements, design, and development then new/upgraded weapons and systems will be difficult to train, operate, and maintain, may not result in the skills, job, grade, and manpower mix as projected, and may result in serious human error, injury or Soldier loss. The Army Human Systems Integration (HSI) program seeks to overcome these challenges by ensuring appropriate consideration and integration of seven technical domains: Human Factors Engineering (e.g., usability), Manpower, Personnel, Training, Safety and Occupational Health, Habitability, Force Protection and Survivability. The tradeoffs, constraints, and limitations occurring among and between these technical domains allows HSI to execute a coordinated, systematic process for putting the warfighter at the center of the design process – equipping the warfighter rather than manning equipment. To that end, the Army HSI Headquarters, currently as a directorate within the Army Headquarters Deputy Chief of Staff (DCS), G-1 develops strategies and ensures human systems factors are early key drivers in concepts, strategy, and requirements, and are fully integrated throughout system design, development, testing and evaluation, and sustainment The need to consider HSI factors early in the development cycle is critical. Too often, man-machine interface issues are not addressed until late in the development cycle (i.e. production and deployment phase) after the configuration of a particular weapon or system has been set. What results is a degraded combat capability, suboptimal system and system-of-systems integration, increased training and sustainment requirements, or fielded systems not in use. Acquisition test data are also good sources to glean HSI return on investment (ROI) metrics. Defense acquisition reports such as test and evaluation operational assessments identifies HSI factors as root causes when Army programs experience increase cost, schedule overruns, or low performance. This is identifiable by the number and type of systems that require follow-on test and evaluation (FOT&E), over reliance on field service representatives (FSRs), costly and time consuming engineering change requests (ECRs), or failures in achieving reliability, availability, and maintainability (RAM) key performance parameters (KPPs) and key system attributes (KSAs). In this presentation, we will present these data and submit several return on investment (ROI) metrics, closely aligned to the defense acquisition process, to emphasize and illustrate the value of HSI. Optimizing Warfighter-System performance and reducing human errors, minimizing risk of Soldier loss or injury, and reducing personnel and materiel life cycle costs produces data that are inextricably linked to early, iterative, and measurable HSI processes within the defense acquisition system. |
Jeffrey Thomas | Breakout |
![]() | 2019 | ||||||||
Air Force Human Systems Integration Program (Abstract)
The Air Force (AF) Human Systems Integration (HSI) program is led by the 711th Human Performance Wing’s Human Systems Integration Directorate (711 HPW/HP). 711 HPW HP provides direct support to system program offices and AF Major Commands (MAJCOMs) across the acquisition lifecycle from requirements development to fielding and sustainment in addition to providing home office support. With an ever-increasing demand signal for support, HSI practitioners within 711 HPW/HP assess HSI domain areas for human-centered risks and strive to ensure systems are designed and developed to safely, effectively, and affordably integrate with human capabilities and limitations. In addition to system program offices and MAJCOMs, 711 HPW/HP provides HSI support to AF Centers (e.g., AF Sustainment Center, AF Test Center), the AF Medical Service, and special cases as needed. The AF Global Strike Command (AFGSC) is the largest MAJCOM with several Programs of Record (POR), such as the B-1, B-2, and B-52 bombers, Intercontinental Ballistic Missiles (ICBM), Ground-Based Strategic Deterrent (GBSD), Airborne Launch Control System (ALCS), and other support programs/vehicles like the UH-1N. Mr. Anthony Thomas (711 HPW/HP), the AFGSC HSI representative, will discuss how 711 HPW/HP supports these programs at the MAJCOM headquarters level and in the system program offices. |
Anthony Thomas | Breakout |
![]() | 2019 | ||||||||
Toward Real-Time Decision Making in Experimental Settings (Abstract)
Materials scientists, computer scientists and statisticians at LANL have teamed up to investigate how to make near real time decisions during fast-paced experiments. For instance, a materials scientist at a beamline typically has a short window in which to perform a number of experiments, after which they analyze the experimental data, determine interesting new experiments and repeat. In typical circumstances, that cycle could take a year. The goal of this research and development project is to accelerate that cycle so that interesting leads are followed during the short window for experiments, rather than in years to come. We detail some of our UQ work in materials science, including emulation, sensitivity analysis, and solving inverse problems, with an eye toward real-time decision making in experimental settings. |
Devin Francom | Breakout | 2019 | |||||||||
Area Validation for Applications with Mixed Uncertainty (Abstract)
Model validation is a process for determining how accurate a model is when compared to a true value. The methodology uses uncertainty analysis in order to assess the discrepancy between a measured and predicted value. In the literature, there have been several area metrics introduced to handle these type of discrepancies. These area metrics were applied to problems that include aleatory uncertainty, epistemic uncertainty, and mixed uncertainty. However, these methodologies lack the ability to fully characterize the true dierences between the experimental and prediction data when mixed un- certainty exists in the measurements and/or in the predictions. This work will introduce a new area metric validation approach which aims to com- pensate for the shortcomings in current techniques. The approach will be described in detail and comparisons between existing metrics will be shown. To demonstrate its applicability the new area metric will be applied to a stagnation point calibration probe’s surface predictions for a low-enthalpy conditions. For this application, testing was preformed in the Hypersonic Materials Environmental Test System (HYMETS) facility located at NASA Langley Research Center. |
Laura White | Breakout |
![]() | 2019 | ||||||||
A 2nd-Order Uncertainty Quantification Framework Applied to a Turbulence Model Validation Effort (Abstract)
Computational fluid dynamics is now considered to be an indispensable tool for the design and development of scramjet engine components. Unfortunately, the quantification of uncertainties is rarely addressed with anything other than sensitivity studies, so the degree of confidence associated with the numerical results remains exclusively with the subject matter expert that generated them. This practice must be replaced with a formal uncertainty quantification process for computational fluid dynamics to play an expanded role in the system design, development, and flight certification process. Given the limitations of current hypersonic ground test facilities, this expanded role is believed to be a requirement by some in the hypersonics community if scramjet engines are to be given serious consideration as a viable propulsion system. The present effort describes a simple, relatively low cost, nonintrusive approach to uncertainty quantification that includes the basic ingredients required to handle both aleatoric (random) and epistemic (lack of knowledge) sources of uncertainty. The nonintrusive nature of the approach allows the computational fluid dynamicist to perform the uncertainty quantification with the flow solver treated as a “black box”. Moreover, a large fraction of the process can be automated, allowing the uncertainty assessment to be readily adapted into the engineering design and development workflow. In the present work, the approach is applied to a model scramjet isolator problem where the desire is to validate turbulence closure models in the presence of uncertainty. In this context, the relevant uncertainty sources are determined and accounted for to allow the analyst to delineate turbulence model-form errors from other sources of uncertainty associated with the simulation of the facility flow. |
Robert Baurle | Breakout |
![]() | 2019 | ||||||||
Sources of Error and Bias in Experiments with Human Subjects (Abstract)
No set of experimental data is perfect and researchers are aware that data from experimental studies invariably contain some margin of error. This is particularly true of studies with human subjects since human behavior is vulnerable to a range of intrinsic and extrinsic influences beyond the variables being manipulated in a controlled experimental setting. Potential sources of error may lead to wide variations in the interpretation of results and the formulation of subsequent implications. This talk will discuss specific sources of error and bias in the design of experiments and present systematic ways to overcome these effects. First, some of the basic errors in general experimental design will be discussed, including human errors, systematic errors and random errors. Second, we will explore specific types of experimental error that appear in human subjects research. Lastly, we will discuss the role of bias in experiments with human subjects. Bias is a type of systematic error that is introduced into the sampling or testing phase and encourages one outcome over another. Often, bias is the result of the intentional or unintentional influence that an experimenter may exert on the outcomes of a study. We will discuss some common sources of bias in research with human subjects, including biases in sampling, selection, response, performance execution, and measurement. The talk will conclude with a discussion of how errors and bias influence the validity of human subjects research and will explore some strategies for controlling these errors and biases |
Poornima Madhavan | Breakout | 2019 | |||||||||
Deep Reinforcement Learning (Abstract)
An overview of Deep Reinforcement Learning and it’s recent successes in creating high performing agents. Covering it’s application in “easy” environments up to massively complex multi-agent strategic environments. Will analyze the behaviors learned, discuss research challenges, and imagine future possibilities. |
Benjamin Bell | Breakout |
![]() | 2019 | ||||||||
Decentralized Signal Processing and Distributed Control for Collaborative Autonomous Sensor Networks (Abstract)
Collaborative autonomous sensor networks have recently been used in many applications including inspection, law enforcement, search and rescue, and national security. They offer scalable, low cost solutions which are robust to the loss of multiple sensors in hostile or dangerous environments. While often comprised of less capable sensors, the performance of a large network can approach the performance of far more capable and expensive platforms if nodes are effectively coordinating their sensing actions and data processing. This talk will summarize work to date at LLNL on distributed signal processing and decentralized optimization algorithms for collaborative autonomous sensor networks, focusing on ADMM-based solutions for detection/estimation problems and sequential greedy optimization solutions which maximize submodular functions, e.g. mutual information. |
Ryan Goldhahn | Breakout | 2019 | |||||||||
An Overview of Uncertainty-Tolerant Decision Support Modeling for Cybersecurity (Abstract)
Cyber system defenders face the challenging task of continually protecting critical assets and information from a variety of malicious attackers. Defenders typically function within resource constraints, while attackers operate at relatively low costs. As a result, design and development of resilient cyber systems that support mission goals under attack, while accounting for the dynamics between attackers and defenders, is an important research problem. This talk will highlight decision support modeling challenges under uncertainty within non-cooperative cybersecurity settings. Multiple attacker-defender game formulations under uncertainty are discussed with steps for further research. |
Samrat Chatterjee | Breakout | 2019 | |||||||||
Software Reliability and Security Assessment: Automation and Frameworks (Abstract)
Software reliability models enable several quantitative predictions such as the number of faults remaining, failure rate, and reliability (probability of failure free operation for a specified period of time in a specified environment). This talk will describe recent efforts in collaboration with NASA, including (1) the development of an automated script for the SFRAT (Software Failure and Reliability Assessment Tool) to streamline application of software reliability methods to ongoing programs, (2) application to a NASA program, (3) lessons learned, (4) and future directions for model and tool development to support the practical needs of the software reliability and security assessment frameworks. |
Lance Fiondella | Breakout |
![]() | 2019 | ||||||||
AI & ML in Complex Environment (Abstract)
The U.S. Army Research Laboratory’s (ARL) Essential Research Program (ERP) on Artificial Intelligence & Machine Learning (AI & ML) seeks to research, develop and employ a suite of AI-inspired and ML techniques and systems to assist teams of soldiers and autonomous agents in dynamic, uncertain, complex operational conditions. Systems will be robust, scalable, and capable of learning and acting with varying levels of autonomy, to become integral components of networked sensors, knowledge bases, autonomous agents, and human teams. Three specific research gaps will be examined: (i) Learning in Complex Data Environments, (ii) Resource-constrained AI Processing at the Point-of-Need and (iii) Generalizable & Predictable AI. The talk will highlight ARL’s internal research efforts over the next 3-5 years that are connected, cumulative and converging to produce tactically-sensible AI-enabled capabilities for decision making at the tactical edge, specifically addressing topics in: (1) adversarial distributed machine learning, (2) robust inference & machine learning over heterogeneous sources, (3) adversarial reasoning integrating learned information, (4) adaptive online learning and (5) resource-constrained adaptive computing. The talk will also highlight collaborative research opportunities in AI & ML via ARL’s Army AI Innovation Institute (A2I2) which will harness the distributed research enterprise via the ARL Open Campus & Regional Campus initiatives. |
Tien Pham | Breakout | 2019 | |||||||||
Waste Not, Want Not: A Methodological Illustration of Quantitative Text Analysis (Abstract)
“The wise use of one’s resources will keep one from poverty.” This is the definition of the proverbial saying “waste not, want not” according to www.dictionary.com. Indeed, one of the most common resources analysts encounter is text in free-form. This text might come from survey comments, feedback, websites, transcriptions of interviews, videos, etcetera. Notably, researchers have used wisely the information conveyed in text for many years. However, in many instances, the qualitative methods employed require numerous hours of reading, training, coding, and validating, among others. As technology continues to evolve, simple access to text data is blooming. For example, analysts conducting online studies can have thousands of text entries from participants’ comments. Even without recent advances in technology analysts have had access to text in books, letters, and other archival data for centuries. One important challenge, however, is figuring out how to make sense of text data without investing a large number of resources, time, and the effort involved in qualitative methodology or “old-school” quantitative approaches (such as reading a collection of 200 books and counting the occurrence of important terms in the text). This challenge has been solved in the information retrieval field –a branch of computer science—with the implementation of a technique called latent semantic analysis (LSA; Manning, Raghavan, & Schütze, 2008) and a closely related technique called topic analysis (TA; SAS Institute Inc., 2018). Undoubtedly, other quantitative methods for text analysis, such as latent Dirichlet analysis (Blei, Ng, & Jordan, 2003), are also apt for the task of unveiling knowledge from text data, but we restrict the discussion in this presentation to LSA and TA because these exclusively deal with the underlying structure of the text rather than identifying clusters. In this presentation, we aim to make quantitative text analysis –specifically LSA and TA– accessible to researchers and analysts from a variety of disciplines. We do this by leveraging understanding of a popular multivariate technique: principal components analysis (PCA). We start by describing LSA and TA by drawing comparisons and equivalencies to PCA. We make these comparisons in an intuitive, user-friendly manner and then through a technical description of mathematical statements, which rely on the singular value decomposition of a document-term matrix. Moreover, we explain the implementation of LSA and TA using statistical software to enable simple application of these techniques. Finally, we show a practical application of LSA and TA with empirical data of aircraft incidents. |
Laura Castro-Schilo | Breakout |
![]() | 2019 | ||||||||
Machine Learning Prediction With Streamed Sensor Data: Fitting Neural Networks using Functional Principal Components (Abstract)
Sensors that record sequences of measurements are now embedded in many products from wearable exercise watches to chemical and semiconductor manufacturing equipment. There is information in the shapes of the sensor stream curves that is highly predictive of a variety of outcomes such as the likelihood of a product failure event or batch yield. Despite this data now being common and readily available, it is often being used either inefficiently or not at all due to lack of knowledge and tools for how to properly leverage it. In this presentation, we will propose fitting splines to sensor streams and extracting features called functional principal component scores that offer a highly efficient low dimensional compression of the signal data. Then, we use these features as inputs into machine learning models like neural networks and LASSO regression models. Once one sees sensor data in this light, answering a wide variety of applied questions becomes a straightforward two stage process of data cleanup/functional feature extraction followed by modeling using those features as inputs. |
Chris Gotwalt | Breakout |
![]() | 2019 | ||||||||
Screening Designs for Resource Constrained Deterministic M&S Experiments: A Munitions Case Study (Abstract)
Abstract: In applications where modeling and simulation runs are quick and cheap, space filling designs will give the tester all the information they need to make decisions about their system. In some applications however, this luxury does not exist, and each M&S run can be time consuming and expensive. In these scenarios, a sequential test approach provides an efficient solution where an initial screening is conducted, followed by an augmentation to fit specified models of interest. Until this point, no dedicated screening designs for UQ applications in resource constrained situations existed. Due to the Army’s frequent exposure to this type of situation, the need sparked a collaboration between Picatinny’s Statistical Methods and Analysis group and Professor V. Roshan Joseph of Georgie Tech, where a new type of UQ screening design was created. This paper provides a brief introduction to the design, its intended use, and a case study in which this new methodology was applied. |
Christopher Drake | Breakout |
![]() | 2019 | ||||||||
The Isle of Misfit Designs: A Guided Tour of Optimal Designs That Break the Mold (Abstract)
Whether it was in a Design of Experiments course or through your own work, you’ve no doubt seen and become well acquainted with the standard experimental design. You know the features: they’re “orthogonal” (no messy correlations to deal with), their correlation matrices are nice pretty diagonals, and they can only happen with run sizes of 4, 8, 12, 16, and so on. Well what if I told you that there existed optimal designs that defied convention. What if I told you that, yes, you can run an optimal design with, say, 5 factors in 9 runs. Or 10. Or even 11 runs! Join me as I show you a strange new world of optimal designs that are the best at what they do, even though they might not look very nice. |
Caleb King | Breakout |
![]() | 2019 | ||||||||
Adapting Operational Test to Rapid-Acquisition Programs (Abstract)
During the past several years, the DoD has begun applying rapid prototyping and fielding authorities—granted by Congress in the FY2016-FY2018 National Defense Authorization Acts (NDAA)—to many acquisition programs. Other programs have implemented an agile acquisition strategy where incremental capability is delivered in iterative cycles. As a result, Operational Test Agencies (OTA) have had to adjust their test processes to accommodate shorter test timelines and periodic delivery of capability to the warfighter. In this session, representatives from the Service OTAs will brief examples where they have implemented new practices and processes for conducting Operational Test on acquisition programs categorized as agile, DevOps, and/or Section 804 rapid-acquisition efforts. During the final 30 minutes of the session, a panel of OTA representatives will field questions from the audience concerning the challenges and opportunities related to test design, data collection, and analysis, that rapid-acquisition programs present. |
Panel Discussion | Breakout |
![]() | 2019 | ||||||||
Sample Size Calculations for Quiet Sonic Boom Community Surveys (Abstract)
NASA is investigating the dose-response relationship between quiet sonic boom exposure and community noise perceptions. This relationship is the key to possible future regulations that would replace the ban on commercial supersonic flights with a noise limit. We have built several Bayesian statistical models using pilot community study data. Using goodness of fit measures, we downselected to a subset of models which are the most appropriate for the data. From this subset of models we demonstrate how to calculate sample size requirements for a simplified example without any missing data. We also suggest how to modify the sample size calculation to account for missing data. |
Jasme Lee | Breakout |
![]() | 2019 | ||||||||
Improved Surface Gunnery Analysis with Continuous Data (Abstract)
Swarms of small, fast speedboats can challenge even the most capable modern warships, especially when they operate in or near crowded shipping lanes. As part of the Navy’s operational testing of new ships and systems, at-sea live-fire tests against remote-controlled targets allow us to test our capability against these threats. To ensure operational realism, these events are minimally scripted and allow the crew to respond in accordance with their training. This is a trade-off against designed experiments, which ensure statistically optimal sampling of data from across the factor space, but introduce many artificialities. A recent test provided data on the effectiveness of naval gunnery. However, standard binomial (hit/miss) analyses fell short, as the number of misses was much larger than the number of hits. This prevented us from fitting more than a few factors and resulted in error bars so large as to be almost useless. In short, binomial analysis taught us nothing we did not already know. Recasting gunfire data from binomial (hit/miss) to continuous (time-to-kill) allowed us to draw statistical conclusions with tactical implications from these free-play, live-fire surface gunnery events. Using a censored-data analysis approach enabled us to make this switch and avoid the shortcomings of other statistical methods. Ultimately, our analysis provided the Navy with suggestions for improvements to its tactics and the employment of its weapons. |
Benjamin Ashwell & V. Bram Lillard | Breakout |
![]() | 2019 | ||||||||
Human in the Loop Experiment Series Evaluating Synthetic Vision Displays for Enhanced Airplane State Awareness (Abstract)
Recent data from Boeing’s Statistical Summary of Commercial Jet Airplane Accidents shows that Loss of Control – In Flight (LOC-I) is the leading cause of fatalities in commercial aviation accidents worldwide. The Commercial Aviation Safety Team (CAST), a joint government and industry effort tasked with reducing the rate of fatal accidents, requested that the National Aeronautics and Space Administration (NASA) conduct research on virtual day-visual meteorological conditions displays, such as synthetic vision, in order to combat LOC-I. NASA recently concluded a series of experiments using commercial pilots from various backgrounds to evaluate synthetic vision displays. This presentation will focus on the two most recent experiments: one conducted with the Navy’s Disorientation Research Device and one completed at NASA Langley Research Center that utilized the Microsoft HoloLens to display synthetic vision. Statistical analysis was done on aircraft performance data, pilot inputs, and a range of subjective questionnaires to assess the efficacy of the displays. |
Kathryn Ballard | Breakout |
![]() | 2019 | ||||||||
Statistical Engineering and M&S in the Design and Development of DoD Systems (Abstract)
This presentation will use a notional armament system case-study to illustrate the use of M&S DOE, surrogate modeling, sensitivity analysis, multi-objective optimization and model calibration during early lifecycle development and design activities in the context of a new armament system. In addition to focusing on the statistician’s, data scientist’s, or analyst’s role and the key statistical techniques in engineering DoD systems, this presentation will also emphasize the non-statistical / engineering domain-specific aspects in a multidisciplinary design and development process which make uses of these statistical approaches at the subcomponent and subsystem-level as well as the end-to-end system modeling. A statistical engineering methodology which emphasizes the use of ‘virtual’ DOE-based model emulators developed at the subsystem-level and integrated using a systems-engineering architecture framework can yield a more tractable engineering problem compared to traditional ‘design-build-test-fix’ cycles or direct simulation of computationally expensive models. This supports a more informed prototype design for physical experimentation while providing a greater variety of materiel solutions thereby reducing development and testing cycles and time to field complex systems. |
Doug Ray & Melissa Jablonski | Breakout | 2019 | |||||||||
Uncertainty Quantification: Combining Large Scale Computational Models with Physical Data for Inference (Abstract)
Combining physical measurements with computational models is key to many investigations involving validation and uncertainty quantification (UQ). This talk surveys some of the many approaches taken for validation and UQ, with large-scale computational models. Experience with such applications suggests classifications of different types of problems with common features (e.g. data size, amount of empiricism in the model, computational demands, availability of data, extent of extrapolation required, etc.). More recently, social and social-technical systems are being considered for similar analyses, bringing new challenges to this area. This talk will approaches for such problems and will highlight what might be new research directions for application and methodological development in UQ. |
Dave Higdon | Breakout |
![]() | 2019 | ||||||||
A User-Centered Design Approach to Military Software Development (Abstract)
This case study highlights activities performed during the front-end process of a software development effort undertaken by the Fire Support Command and Control Program Office. This program office provides the U.S. Army, Joint and coalition commanders with the capability to plan, execute and deliver both lethal and non-lethal fires. Recently, the program office has undertaken modernization of its primary field artillery command and control system that has been in use for over 30 years. The focus of this case study is on the user-centered design process and activities taken prior to and immediately following contract award. A modified waterfall model comprised of three cyclic, yet overlapping phases (observation, visualization, and evaluation) provided structure for the iterative, user-centered design process. Gathering and analyzing data collected during focus groups, observational studies, and workflow process mapping, enabled the design team to identify 1) design patterns across the role/duty, unit and echelon matrix (a hierarchical organization structure), 2) opportunities to automate manual processes, 3) opportunities to increase efficiencies for fire mission processing, 4) bottlenecks and workarounds to be eliminated through design of the modernized system, 5) shortcuts that can be leveraged in design, 6) relevant and irrelevant content for each user population for streamlining access to functionality, 7) a usability baseline for later comparison (e.g., the number of steps and time taken to perform a task as captured in workflows for comparison to the same task in the modernized system), and provided the basis for creating visualizations using wireframes. Heuristic evaluations were conducted early to obtain initial feedback from users. In the next few months, usability studies will enable users to provide feedback based on actual interaction with the newly designed software. Included in this case study are descriptions of the methods used to collect user-centered design data, how results were visualized/documented for use by the development team, and lessons learned from applying user-centered design techniques during software development of a military field artillery command and control system. |
Pam Savage-Knepshield | Breakout |
![]() | 2019 | ||||||||
Engineering first, Statistics second: Deploying Statistical Test Optimization (STO) for Cyber (Abstract)
Due to the immense potential use cases, configurations, and threat behaviors, thorough and efficient cyber testing is a significant challenge for the defense community. In this presentation, Phadke will present case studies where STO was successfully deployed for cyber testing, resulting in higher assurance, reduced schedule, and reduced testing cost. Phadke will also discuss importance first focusing on the engineering and science analysis, and only after that is complete, implementing statistical methods. |
Kedar Phadke | Breakout | 2019 | |||||||||
Validation and Uncertainty Quantification of Complex Models (Abstract)
Advances in high performance computing have enabled detailed simulations of real-world physical processes, and these simulations produce large datasets. Even as detailed as they are, these simulations are only approximations of imperfect mathematical models, and furthermore, their outputs depend on inputs that are themselves uncertain. The main goal of a validation and uncertainty quantication methodology is to determine the uncertainty, that is, the relationship between the true value of a quantity of interest and its prediction by the simulation. The value of the computational results is limited unless the uncertainty can be quantied or bounded. Bayesian calibration is a common method for estimating model parameters and quantifying their associated uncertainties; however, calibration becomes more complicated when the data arise from dierent types of experiments. On an example from material science we will employ two types of data and demonstrate how one can obtain a set of material strength models that agree with both data sources. |
Kassie Fronczyk | Breakout |
![]() | 2019 | ||||||||
Communicating Statistical Concepts and Results: Lessons Learned from the US Service Academies (Abstract)
Communication is critical both for analysts and decision-makers who rely on analysis to inform their choices. The Service Academies are responsible for educating men and women who may serve in both roles over the course of their careers. Analysts must be able to summarize their results concisely and communicate them to the decision-maker in a way that is relevant and actionable. Decision-makers understand that analytical results may carry with them uncertainty and be able to incorporate this uncertainty properly when evaluating different options. This panel explores the role of the US Service Academies in preparing their students for both roles. Featuring representatives from the US Air Force Academy, the US Naval Academy, and the US Military Academy, this panel will cover how future US Officers are taught to use and communicate with data. Topics include developing and motivating numerical literacy, understandings of uncertainty, how analysts should frame uncertainty to decision-makers, and how decision-makers should understand information presented with uncertainty. Panelists will discuss what they think the academies do well and areas that are ripe for improvement. |
Panel Discussion | Breakout | 2019 | |||||||||
Multivariate Density Estimation and Data-enclosing Sets Using Sliced-Normal Distributions (Abstract)
This talk focuses on a means to characterize the variability in multivariate data. This characterization, given in terms of both probability distributions and closed sets, is instrumental in assessing and improving the robustness/reliability properties of system designs. To this end, we propose the Sliced-Normal (SN) class of distributions. The versatility of SNs enables modeling complex multivariate dependencies with minimal modeling effort. A polynomial mapping is defined which injects the physical space into a higher dimensional (so-called) feature space on which a suitable normal distribution is defined. Optimization-based strategies for the estimation of SNs from data in both physical and feature space are proposed. The formulations in physical space yield non-convex optimization programs whose solutions often outperform the solutions in feature space. However, the formulations in feature space yield either an analytical solution or a convex program thereby facilitating their application to problems in high dimensions. The superlevel sets of a SN density have a closed semi-algebraic form making them amenable to rigorous uncertainty quantification methods. Furthermore, we propose a chance-constrained optimization framework for identifying and eliminating the effects of outliers in the prescription of such regions. These strategies can be used to mitigate the conservatism intrinsic to many methods in system identification, fault detection, robustness/reliability analysis, and robust design caused by assuming parameter independence and by including outliers in the dataset. |
Luis Crespo | Breakout |
![]() | 2019 | ||||||||
A Statistical Approach for Uncertainty Quantification with Missing Data (Abstract)
Uncertainty quantification (UQ) has emerged as the science of quantitative characterization and reduction of uncertainties in simulation and testing. Stretching across applied mathematics, statistics, and engineering, UQ is a multidisciplinary field with broad applications. A popular UQ method to analyze the effects of input variability and uncertainty on the system responses is generalized Polynomial Chaos Expansion (gPCE). This method was developed using applied mathematics and does not require knowledge of a simulation’s physics. Thus, gPCE may be used across disparate industries and is applicable to both individual component and system level simulations. The gPCE method can encounter problems when any of the input configurations fail to produce valid simulation results. gPCE requires that results be collected on a sparse grid Design of Experiment (DOE), which is generated based on probability distributions of the input variables. A failure to run the simulation at any one input configuration can result in a large decrease in the accuracy of a gPCE. In practice, simulation data sets with missing values are common because simulations regularly yield invalid results due to physical restrictions or numerical instability. We propose a statistical approach to mitigating the cost of missing values. This approach yields accurate UQ results if simulation failure makes gPCE methods unreliable. The proposed approach addresses this missing data problem by introducing an iterative machine learning algorithm. This methodology allows gPCE modelling to handle missing values in the sparse grid DOE. The study will demonstrate the convergence characteristics of the methodology to reach steady state values for the missing points using a series of simulations and numerical results. Remarks about the convergence rate and the advantages and feasibility of the proposed methodology will be provided. Several examples are used to demonstrate the proposed framework and its utility including a secondary air system example from the jet engine industry and several non-linear test functions. This is based on joint work with Dr. Mark Andrews at SmartUQ. |
Mark Andrews | Breakout | 2019 | |||||||||
Exploring Problems in Shipboard Air Defense with Modeling (Abstract)
One of the primary roles of navy surface combatants is defending high-value units against attack by anti-ship cruise missiles (ASCMs). They accomplish this either by launching their interceptor missiles and shooting the ASCMs down with rapid-firing guns (hard kill), or through the use of deceptive jamming, decoys, or other non-kinetic means (soft kill) to defeat the threat. The wide range of hostile ASCM capabilities and the different properties of friendly defenses, combined with the short time-scale for defeating these ASCMs, makes this a difficult problem to study. IDA recently completed a study focusing on the extent to which friendly forces were vulnerable to massed ASCM attacks, and possible avenues for improvement. To do this we created a pair of complementary models with the combined flexibility to explore a wide range of questions. The first model employed a set of closed-form equations, and the second a time-dependent Monte Carlo simulation. This presentation discusses the thought processes behind the models and their relative strengths and weaknesses. |
Ralph Donnelly & Benjamin Ashwell | Breakout |
![]() | 2019 | ||||||||
Sequential Testing for Fast Jet Life Support Systems (Abstract)
The concept of sequential testing has many disparate meanings. Often, for statisticians it takes on a purely mathematical context while possibly meaning multiple disconnected test events for some practitioners. Here we present a pedagogical approach to creating test designs involving constrained factors using JMP software. Recent experiences testing one of the U.S. military’s fast jet life support systems (LSS) serves as a case study and back drop to support the presentation. The case study discusses several lessons learned during LSS testing, applicable to all practitioners of scientific test and analysis techniques (STAT) and design of experiments (DOE). We conduct a short analysis to specifically determine a test region with a set of factors pertinent to modeling human breathing and the use of breathing machines as part of the laboratory setup. A comparison of several government and industry laboratory test points and regions with governing documentation is made, along with the our proposal for determining a necessary and sufficient test region for tests involving human breathing as a factor. |
Darryl Ahner
(bio)
Steven Thorsen, Sarah Burke & |
Breakout |
![]() | 2019 | ||||||||
Bayesian Component Reliability Estimation: F-35 Case Study (Abstract)
A challenging aspect of a system reliability assessment is integrating multiple sources of information, including component, subsystem, and full-system data, previous test data, or subject matter expert opinion. A powerful feature of Bayesian analyses is the ability to combine these multiple sources of data and variability in an informed way to perform statistical inference. This feature is particularly valuable in assessing system reliability where testing is limited and only a small number (or no failures at all) are observed. The F-35 is DoD’s largest program; approximately one-third of the operations and sustainment cost is attributed to the cost of spare parts and the removal, replacement, and repair of components. The failure rate of those components is the driving parameter for a significant portion of the sustainment cost, and yet for many of these components, poor estimates of the failure rate exist. For many programs, the contractor produces estimates of component failure rates, based on engineering analysis and legacy systems with similar parts. While these are useful, the actual removal rates can provide a more accurate estimate of the removal and replacement rates the program anticipates to experience in future years. In this presentation, we show how we applied a Bayesian analysis to combine the engineering reliability estimates with the actual failure data to overcome the problems of cases where few data exist. Our technique is broadly applicable to any program where multiple sources of reliability information need be combined for the best estimation of component failure rates and ultimately sustainment costs. |
V. Bram Lillard & Rebecca Medlin | Breakout |
![]() | 2019 | ||||||||
Applying Functional Data Analysis throughout Aerospace Testing (Abstract)
Sensors abound in aerospace testing and while many scientists look at the data from a physics perspective, the comparative statistics information is what drives decisions. A multi-company project was comparing launch data from the 1980’s to a current set of data that included 30 sensors. Each sensor was designed to gather 3000 data points during the 3-second launch event. The data included temperature, acceleration, and pressure information. This talk will compare the data analysis methods developed for this project as well as the use of the new Functional Data Analysis tool within JMP for its ability to discern in-family launch performances. |
David Harrison | Breakout |
![]() | 2019 | ||||||||
Adopting Optimized Software Test Design Methods at Scale (Abstract)
Using Combinatorial Test Design methods to select software test scenarios has repeatedly delivered large efficiency and thoroughness gains – which begs the questions: • Why are these proven methods not used everywhere? • Why do some efforts to promote adoption of new approaches stagnate? • What steps can leaders take to introduce successfully introduce and spread new test design methods? For more than a decade, Justin Hunter has helped large global organizations across six continents adopt new test design techniques at scale. Working in some environments, he has felt like Sisyphus, forever condemned to roll a boulder uphill only to watch it roll back down again. In other situations, things clicked; teams smoothly adopted new tools and techniques, and impressive results were quickly achieved. In this presentation, Justin will discuss several common challenges faced by large organizations, explain why adopting test design tools is more challenging than adopting other types of development and testing tools, and share actionable recommendations to consider when you roll out new test design approaches. |
Justin Hunter | Breakout |
![]() | 2019 | ||||||||
A Quantitative Assessment of the Science Robustness of the Europa Clipper Mission (Abstract)
Existing characterization of Europa’s environment is enabled by the Europa Clipper mission’s successful predecessors: Pioneer, Voyager, Galileo, and most recently, Juno. These missions reveal high intensity energetic particle fluxes at Europa’s orbit, requiring a multidimensional design challenge to ensure mission success (i.e. meeting Level 1 science requirements). Risk averse JPL Design Principles and the Europa Environment Requirement Document (ERD) dictate practices and policy, which if masterfully followed, are designed to protect Clipper from failure or degradation due to radiation. However, even if workmanship is flawless and no waivers are assessed, modeling errors, shielding uncertainty, and natural variation in the Europa environment are cause for residual concern. While failure and part degradation are of paramount concern, the occurrence of temporary outages, causing loss or degradation of science observations, is also a critical mission risk, left largely unmanaged by documents like the ERD. The referenced risk is monitored and assessed through a Project Systems Engineering-led mission robustness effort, which attempts balance the risk of science data loss with potential design cost and increased mission complexity required to mitigate such risk. The Science Sensitivity Model (SSM) was developed to assess mission and science robustness, with its primary goal being to ensure a high probability of achieving Level 1 (L1) science objectives by informing the design of a robust spacecraft, instruments, and mission design. This discussion will provide an overview of the problem, the model, and solution strategies. Subsequent presentations discuss the experimental design used to understand the problem space and the graphics and visualization used to reveal important conclusions. |
Kelli McCoy | Breakout |
![]() | 2019 | ||||||||
Identifying and Contextualizing Maximum Instrument Fault Rates and Minimum Instrument Recovery Times for Europa Clipper Science through Applied Statistics and Strategic Visualizations (Abstract)
Using the right visualizations as part of broad system and statistical Monte Carlo analysis supports interpretation of key drivers and relationships between variables, provides context about the full system, and communicates to non-statistician stakeholders. An experimental design was used to understand the relationships between instrument and spacecraft fault rate and recovery time in relation to the probability of achieving Europa Clipper science objectives during the Europa Clipper tour. Given spacecraft and instrument outages, requirement achievement checks were performed to determine the probability of meeting scientific objectives. Visualizations of the experimental design output enabled analysis of the full parameter set. Correlation between individual instruments and specific scientific objectives is not straight forward; some scientific objectives require a single instrument to be on at certain times and during varying conditions across the trajectory, while other science objectives require multiple instruments to function concurrently. By examining the input conditions that meet scientific objectives with the highest probability, and comparing those to trials with the lowest probability of meeting scientific objectives, key relationships could be visualized, enabling valuable mission and engineering design insights. Key system drivers of scientific success were identified, such as fault rate tolerance and recovery time required for each instrument and the spacecraft. Key steps, methodologies, difficulties and result-highlights are presented, along with a discussion of next steps and options for refinement and future analysis. |
Thomas Youmans | Breakout |
![]() | 2019 | ||||||||
Design and Analysis of Experiments for Europa Clipper’s Science Sensitivity Model (Abstract)
The Europa Clipper Science Sensitivity Model (SSM) can be thought of as a graph in which the nodes are mission requirements at ten levels in a hierarchy, and edges represent how requirements at one level of the hierarchy depend on those at lower levels. At the top of the hierarchy, there are ten nodes representing ten, Level 1 science requirements for the mission. At the bottom of the hierarchy, there are 100 or so nodes representing instrument-specific science requirements. In between, nodes represent intermediate science requirements with complex interdependencies. Meeting, or failing to meet, bottom-level requirements depends on the frequency of faults and the lengths of recovery times on the nine Europa Clipper instruments and the spacecraft. Our task was to design and analyze the results of a Monte Carlo experiment to estimate the probabilities of meeting the Level 1 science requirements based on parameters of the distributions of time between failures and of recovery times. We simulated an ensemble of synthetic missions in which failures and recoveries were random realizations from those distributions. The pass-fail status of the bottom-level instrument-specific requirements were propagated up the graph for each of the synthetic missions. Aggregating over the collection of synthetic missions produced estimates of the pass-fail probabilities for the Level 1 requirements. We constructed a definitive screening design and supplemented it with additional space-filing runs, using JMP 14 software. Finally, we used the vectors of failure and recovery parameters as predictors, and the pass-fail probabilities of the high-level requirements as responses, and built statistical models to predict the latter from the former. In this talk, we will describe the design considerations and review the fitted models and their implications for mission success. |
Amy Braverman | Breakout |
![]() | 2019 | ||||||||
SLS Structural Dynamics Sensor Optimization Study (Abstract)
A crucial step in the design and development of a fight vehicle, such as NASA’s Space Launch System (SLS), is understanding its vibration behavior while in fight. Vehicle designers rely on low-cost finite element analysis (FEA) to predict the vibration behavior of the vehicle. During ground and flight tests, sensors are strategically placed at predefined locations that contribute the most vibration information under the assumption that FEA is accurate, producing points to validate the FEA models. This collaborative work focused on developing optimal sensor placement algorithms to validate FEA models against test data, and to characterize the vehicles vibration characteristics. |
Ken Toro & Jon Stallrich | Breakout | 2019 | |||||||||
The 80/20 rule, can and should we break it using efficient data management tools? (Abstract)
Abstract: Data scientists spend approximately 80% of their time preparing, cleaning, and feature engineering data sets. In this talk I will share use cases that show why this is important and why we need to do it. I will also describe the Earth System Grid Federation (ESGF) which is an open source effort providing a robust, distributed data and computation platform, enabling world wide access to Peta/Exa-scale scientific data. ESGF will help reduce the amount of effort needed for climate data preprocessing by integrating the necessary analysis and data sharing tools. |
Ghaleb Abdulla | Breakout |
![]() | 2019 | ||||||||
Time Machine Learning: Getting Navy Maintenance Duration Right (Abstract)
In support of the Navy’s effort to obtain improved outcomes through data-driven decision making, The Center for Naval Analyses’ Data Science Program (CNA/DSP) supports the performance-to plan(P2P) forum, which is co chaired by the Vice Chief of Naval Operations and the Assistant Secretary of the Navy (RD&A). The P2P forum provides senior Navy leadership forward looking performance forecasts, which are foundational to articulating Navy progress toward readiness and capability goals. While providing analytical support for this forum, CNA/DSP leveraged machine learning techniques, including Random Forests and Artificial Neural Networks, to develop improved estimates of future maintenance durations for the Navy. When maintenance durations exceed their estimated timelines, these delays can affect training, manning, and deployments in support of operational commanders. Currently, the Navy creates maintenance estimates during numerous timeframes including the program objective memorandum (POM) process, the Presidential Budget (PB), and at contract award leading to evolving estimates over time. The limited historical accuracy for these estimates, especially with the POM and PB estimates, have persisted over the last decade. These errors have led to a gap between planned funding and actual costs in addition to changes in the assets available for operational commanders each year. The CNA/DSP prediction model reduces the average error in forecasted maintenance duration days from 128 days to 31 days for POM estimates. Improvements in duration accuracy for the PB and contract award time frames were also achieved using similar ML processes. The data curation for these models involved numerous data sources of varying quality and required significant feature engineering to provide usable model inputs that could allow for forecasts over the Future Years Defense Program (FYDP) in order to support improved resource allocation and scheduling in support of the optimized fleet response training plan (OFRTP). |
Tim Kao | Breakout |
![]() | 2019 | ||||||||
Behavioral Analytics: Paradigms and Performance Tools of Engagement in System Cybersecurity (Abstract)
The application opportunities for behavioral analytics in the cybersecurity space are based upon simple realities. 1. The great majority of breaches across all cybersecurity venues is due to human choices and human error. 2. With communication and information technologies making for rapid availability of data, as well as behavioral strategies of bad actors getting cleverer, there is need for expanded perspectives in cybersecurity prevention. 3. Internally-focused paradigms must now be explored that place endogenous protection from security threats as an important focus and integral dimension of cybersecurity prevention. The development of cybersecurity monitoring metrics and tools as well as the creation of intrusion prevention standards and policies should always include an understanding of the underlying drivers of human behavior. As temptation follows available paths, cyber-attacks follow technology, business models, and behavioral habits. The human element will always be the most significant part in the anatomy of any final decision. Choice options – from input, to judgement, to prediction, to action – need to be better understood for their relevance to cybersecurity work. Behavioral Performance Indexes harness data about aggregate human participation in an active system, helping to capture some of the detail and nuances of this critically important dimension of cybersecurity. |
Robert Gough | Breakout |
![]() | 2019 | ||||||||
3D Mapping, Plotting, and Printing in R with Rayshader (Abstract)
Is there ever a place for the third dimension in visualizing data? Is the use of 3D inherently bad, or can a 3D visualization be used as an effective tool to communicate results? In this talk, I will show you how you can create beautiful 2D and 3D maps and visualizations in R using the rayshader package. Additionally, I will talk about the value of 3D plotting and how good aesthetic choices can more clearly communicate results to stakeholders. Rayshader is a free and open source package for transforming geospatial data into engaging visualizations using a simple, scriptable workflow. It provides utilities to interactively map, plot, and 3D print data from within R. It was nominated by Hadley Wickham to be one of 2018’s Data Visualizations of the Year for the online magazine Quartz. |
Tyler Morgan-Wall | Breakout | 2019 | |||||||||
Functional Data Analysis for Design of Experiments (Abstract)
With nearly continuous recording of sensor values now common, a new type of data called “functional data” has emerged. Rather than the individual readings being modeled, the shape of the stream of data over time is being modeled. As an example, one might model many historical vibration-over-time streams of a machine at start-up to identify functional data shapes associated with the onset of system failure. Functional Principal Components (FPC) analysis is a new and increasingly popular method for reducing the dimensionality of functional data so that only a few FPCs are needed to closely approximate any of a set of unique data streams. When combined with Design of Experiments (DoE) methods the response to be modeled in as fewest tests as possible is now the shape of a stream of data over time. Example analyses will be shown where the form of the curve is modeled as the function of several input variables allowing one to determine the input settings associated with shapes indicative of good or poor system performance. This allows the analyst to predict the shape of the curve as a function of the input variables. |
Tom Donnelly | Breakout |
![]() | 2019 | ||||||||
Test and Evaluation of Emerging Technologies |
Dr. Greg Zacharias, Chief Scientist Operational Test and Evaluation | Breakout |
![]() | 2019 | ||||||||
Challenges in Test and Evaluation of AI: DoD’s Project Maven (Abstract)
The Algorithmic Warfare Cross Functional Team (AWCFT or Project Maven) organizes DoD stakeholders to enhance intelligence support to the warfighter through the use of automation and artificial intelligence. The AWCFT’s objective is to turn the enormous volume of data available to DoD into actionable intelligence and insights at speed. This requires consolidating and adapting existing algorithm-based technologies as well as overseeing the development of new solutions. This brief will describe some of the methodological challenges in test and evaluation that the Maven team is working through to facilitate speedy and agile acquisition of reliable and effective AI / ML capabilities. |
Jane Pinelis | Breakout | 2019 | |||||||||
Demystifying the Black Box: A Test Strategy for Autonomy (Abstract)
Systems with autonomy are beginning to permeate civilian, industrial, and military sectors. Though these technologies have the potential to revolutionize our world, they also bring a host of new challenges in evaluating whether these tools are safe, effective, and reliable. The Institute for Defense Analyses is developing methodologies to enable testing systems that can, to some extent, think for themselves. In this talk, we share how we think about this problem and how this framing can help you develop a test strategy for your own domain. |
Dan Porter | Breakout |
![]() | 2019 | ||||||||
Satellite Affordability in LEO (SAL) (Abstract)
The Satellite Affordability in LEO (SAL) model identifies the cheapest constellation capable of providing a desired level of performance within certain constraints. SAL achieves this using a combination of analytical models, statistical emulators, and geometric relationships. SAL is flexible and modular, allowing users to customize certain components while retaining default behavior in other cases. This is desirable if users wish to consider an alternative cost formulation or different types of payload. Uses for SAL include examining cost tradeoffs with respect to factors like constellation size and desired performance level, evaluating the sensitivity of constellation costs to different assumptions about cost behavior, and providing a first-pass look at what proliferated smallsats might be capable of. At this point, SAL is limited to Walker constellations with sun-synchronous, polar orbits. |
Matthew Avery | Breakout |
![]() | 2019 | ||||||||
Statistical Process Control and Capability Study on the Water Content Measurements in NASA Glenn’s Icing Research Tunnel (Abstract)
The Icing Research Tunnel (IRT) at NASA Glenn Research Center follows the recommended practice for icing tunnel calibration outlined in SAE’s ARP5905 document. The calibration team has followed the schedule of a full calibration every five years with a check calibration done every six months following. The liquid water content of the IRT has maintained stability within in the specifications presented to customers that the variation is within +/- 10% of the calibrated, target measurement. With recent measurements and instrumentation errors, a more thorough assessment of error source was desired. By constructing statistical process control charts, the ability to determine how the instrument varies in the short term, mid term, and long term was gained. The control charts offer a view of instrument error, facility error, or installation changes. It was discovered that there was a shift from target to mean baseline thus leading to the study of the overall capability indices of the liquid water content measuring instrument to perform within specifications defined in the IRT. This presentation describes data processing procedures for the Multi-Element Sensor in the IRT, including collision efficiency corrections, canonical correlation analysis, Chauvenet’s Criterion for rejection of data, distribution check of data, and mean, median and mode for construction of control charts. Further data is presented to describe the repeatability of the IRT with the Multi-Element Sensor and the ability to maintain a stable process for the defined calibration schedule. |
Emily Timko | Breakout | 2019 | |||||||||
Reasoning about Uncertainty with the Stan Modeling Language (Abstract)
This briefing discusses the practical advantages of using the probabilistic programming language (PPL) Stan to answer statistical questions, especially those related to the quantification of uncertainty. Stan is a relatively new statistical tool that allows users to specify probability models and reason about the processes that generate the data they encounter. Stan has quickly become a popular language for writing statistical models because it allows one to specify rich (or sparse) Bayesian models using high level language. Further, Stan is fast, memory efficient, and robust. Stan requires users be explicit about the model they wish to evaluate, which makes the process of statistical modeling more transparent to users and decision makers. This is valuable because it forces practitioners to consider assumptions at the beginning of the model building procedure, rather than at the end (or not at all). In this sense, Stan is the opposite of a “black box” modeling approach. This approach may be tedious and labor intensive at first, but the pay-offs are large. For example, once a model is set-up inferential tasks all essentially automatic, as changing the model does not change the how one analyzes the data. This is a generic approach to inference. To illustrate these points, we use Stan to study a ballistic miss distance problem. In ballistic missile testing, the p-content circular error probable (CEP) in the circle that contains p percent of future shots fired, on average. Statistically, CEP is a bivariate prediction region, constrained by the model to be circular. In Frequentist statistics, the determination of CEP is highly dependent on the model fit, and a different calculation of CEP must be produced for each plausible model. However, with Stan, we can approach the CEP calculation invariant of the model we use to fit the data. We show how to use Stan to calculate CEP and uncertainty intervals for the parameters using summary statistics. Statistical practitioners can access Stan from several programming languages, including R and Python. |
John Haman | Breakout |
![]() | 2019 | ||||||||
Target Location Error Estimation Using Parametric Models |
James Brownlow | Breakout |
![]() | 2019 | ||||||||
Anatomy of a Cyberattack: Standardizing Data Collection for Adversarial and Defensive Analyses (Abstract)
Hardly a week goes by without news of a cybersecurity breach or an attack by cyber adversaries against a nation’s infrastructure. These incidents have wide-ranging effects, including reputational damage and lawsuits against corporations with poor data handling practices. Further, these attacks do not require the direction, support, or funding of technologically advanced nations; instead, significant damage can be – and has been – done with small teams, limited budgets, modest hardware, and open source software. Due to the significance of these threats, it is critical to analyze past events to predict trends and emerging threats. In this document, we present an implementation of a cybersecurity taxonomy and a methodology to characterize and analyze all stages of a cyberattack. The chosen taxonomy, MITRE ATT&CK™, allows for detailed definitions of aggressor actions which can be communicated, referenced, and shared uniformly throughout the cybersecurity community. We translate several open source cyberattack descriptions into the analysis framework, thereby constructing cyberattack data sets. These data sets (supplemented with notional defensive actions) illustrate example Red Team activities. The data collection procedure, when used during penetration testing and Red Teaming, provides valuable insights about the security posture of an organization, as well as the strengths and shortcomings of the network defenders. Further, these records can support past trends and future outlooks of the changing defensive capabilities of organizations. From these data, we are able to gather statistics on the timing of actions, detection rates, and cyberattack tool usage. Through analysis, we are able to identify trends in the results and compare the findings to prior events, different organizations, and various adversaries. |
Jason Schlup | Breakout |
![]() | 2019 | ||||||||
A Survey of Statistical Methods in Aeronautical Ground Testing |
Drew Landman | Breakout |
![]() | 2019 | ||||||||
Your Mean May Not Mean What You Mean It to Mean (Abstract)
The average and standard deviation of, say, strength or dimensional test data are basic engineering math, simple to calculate. What those resulting values actually mean, however, may not be simple, and can be surprisingly different from what a researcher wants to calculate and communicate. Mistakes can lead to overlarge estimates of spread, structures that are over- or under-designed and other challenges to understanding or communicating what your data is really telling you. This talk will discuss some common errors and missed opportunities seen in engineering and scientific analyses along with mitigations that can be applied through smart and efficient test planning and analysis. It will cover when – and when not – to report a simple mean of a dataset based on the way the data was taken; why ignoring this often either hides or overstates risk; and a standard method for planning tests and analyses to avoid this problem. And it will cover what investigators can correctly (or incorrectly) say about means and standard deviations of data, including how and why to describe uncertainty and assumptions depending on what a value will be used for. The presentation is geared toward the engineer, scientist or project manager charged with test planning, data analysis or understanding findings from tests and other analyses. Attenders’ basic understanding of quantitative data analysis is recommended; more-experienced participants will grasp correspondingly more nuance from the pitch. Some knowledge of statistics is helpful, but not required. Participants will be challenged to think about an average as not just “the average”, but a valuable number that can and must relate to the engineering problem to be solved, and must be firmly based in the data. Attenders will leave the talk with a more sophisticated understanding of this basic, ubiquitous but surprisingly nuanced statistic and greater appreciation of its power as an engineering tool. |
Ken Johnson | Breakout |
![]() | 2019 | ||||||||
A Causal Perspective on Reliability Assessment (Abstract)
Causality in an engineered system pertains to how a system output changes due to a controlled change or intervention on the system or system environment. Engineered systems designs reflect a causal theory regarding how a system will work, and predicting the reliability of such systems typically requires knowledge of this underlying causal structure. The aim of this work is to introduce causal modeling tools that inform reliability predictions based on biased data sources and illustrate how these tools can inform data integration in practice. We present a novel application of the popular structural causal modeling framework to reliability estimation in an engineering application, illustrating how this framework can inform whether reliability is estimable and how to estimate reliability using data integration given a set of assumptions about the subject matter and data generating mechanism. When data are insufficient for estimation, sensitivity studies based on problem-specific knowledge can inform how much reliability estimates can change due to biases in the data and what information should be collected next to provide the most additional information. We apply the approach to a pedagogical example related to a real, but proprietary, engineering application, considering how two types of biases in data can influence a reliability calculation. |
Lauren Hund | Breakout |
![]() | 2019 | ||||||||
Hypergames for Control System Security (Abstract)
The identification of the Stuxnet worm in 2010 provided a highly publicized example of a cyber attack on an industrial control system. This raised public awareness about the possibility of similar attacks against other industrial targets – including critical infrastructure. Here, we use hypergames to analyze how adversarial perturbations can be used to manipulate a system using optimal control. Hypergames form an extension of game theory that models strategic interactions between players with significantly different perceptions of the game(s) they are playing. Previous work on hypergames has been limited to simple interactions, where a small set of discrete choices are available to each player. However, we apply hypergames to larger systems with continuous variables. Our results highlight that manipulating constraints can be a more effective attacker strategy than directly manipulating objective function parameters. Moreover, the attacker need not influence the underlying system to carry out a successful attack – it may be sufficient to deceive the defender controlling the system. Finally, we identify several characteristics that will make our analysis amenable to higher-dimensional control systems. |
Arnab Bhattacharya | Breakout |
![]() | 2019 | ||||||||
Probabilistic Data Synthesis to Provide a Defensible Risk Assessment for Army Munition (Abstract)
Military grade energetics are, by design, required to operate under extreme conditions. As such, warheads in a munition must demonstrate a high level of structural integrity in order to ensure safe and reliable operation by the Warfighter. In this example which involved an artillery munition, a systematic analytics-driven approach was executed which synthesized physical test data results with probabilistic analysis, non-destructive evaluation, modeling and simulation, and comprehensive risk analysis tools in order to determine the probability of a catastrophic event. Once the severity, probability of detection, occurrence, were synthesized, a model was built to determine the risk of a catastrophic event during firing which then accounts for defect growth occurring as a result of rough-handling. This comprehensive analysis provided a defensible, credible, and dynamic snapshot of risk while allowing for a transparent assessment of contribution to risk of the various inputs through sensitivity analyses. This paper will illustrate intersection of product safety, reliability, systems-safety policy, and analytics, and highlight the impact of a holistic multidisciplinary approach. The benefits of this rigorous assessment included quantifying risk to the user, supporting effective decision-making, improving resultant safety and reliability of the munition, and supporting triage and prioritization of future Non-Destructive Evaluation (NDE) screening efforts by identifying at-risk subpopulations. |
Kevin Singer | Breakout |
![]() | 2019 | ||||||||
Using Bayesian Neural Networks for Uncertainty Quantification of Hyperspectral Image Target Detection (Abstract)
Target detection in hyperspectral images (HSI) has broad value in defense applications, and neural networks have recently begun to be applied for this problem. A common criticism of neural networks is they give a point estimate with no uncertainty quantification (UQ). In defense applications, UQ is imperative because the cost of a false positive or negative is high. Users desire high confidence in either “target” or “not target” predictions, and if high confidence cannot be achieved, more inspection is warranted. One possible solution is Bayesian neural networks (BNN). Compared to traditional neural networks which are constructed by choosing a loss function, BNN take a probabilistic approach and place a likelihood function on the data and prior distributions for all parameters (weights and biases), which in turn implies a loss function. Training results in posterior predictive distributions, from which prediction intervals can be computed, rather than only point estimates. Heatmaps show where and how much uncertainty there is at any location and give insight into the physical area being imaged as well as possible improvements to the model. Using pytorch and pyro software, we test BNN on a simulated HSI scene produced using the Rochester Institute of Technology (RIT) Digital Imaging and Remote Sensing Image Generation (DIRSIG) model. The scene geometry used is also developed by RIT and is a detailed representation of a suburban neighborhood near Rochester, NY, named “MegaScene.” Target panels were inserted for this effort, using paint reflectance and bi-directional reflectance distribution function (BRDF) data acquired from the Nonconventional Exploitation Factors Database System (NEFDS). The target panels range in size from large to subpixel, with some targets only partially visible. Multiple renderings of this scene are created under different times of day and with different atmospheric conditions to assess model generalization. We explore the uncertainty heatmap for different times and environments on MegaScene as well as individual target predictive distributions to gain insight into the power of BNN. |
Daniel Ries | Breakout |
![]() | 2019 | ||||||||
Constructing Designs for Fault Location (Abstract)
Abstract. While fault testing a system with many factors each appearing at some number of levels, it may not be possible to test all combinations of factor levels. Most faults are caused by interactions of only a few factors, so testing interactions up to size t will often find all faults in the system without executing an exhaustive test suite. Call an assignment of levels to t of the factors a t-way interaction. A covering array is a collection of tests that ensures that every t-way interaction is covered by at least one test in the test suite. Locating arrays extend covering arrays with the additional feature that they not only indicate the presence of faults but locate the faulty interactions when there are no more than d faults in the system. If an array is (d, t)-locating, for every pair of sets of t-way interactions of size d, the interactions do not appear in exactly the same tests. This ensures that the faulty interactions can be differentiated from non-faulty interactions by the results of some test in which interactions from one set or the other but not both are tested. When the property holds for t-way interaction sets of size up to d, the notation (d, t ¯ ) is used. In addition to fault location, locating arrays have also been used to identify significant effects in screening experiments. Locating arrays are fairly new and few techniques have been explored for their construction. Most of the available work is limited to finding only one fault (d = 1). Known general methods require a covering array of strength t + d and produce many more tests than are needed. In this talk, we present Partitioned Search with Column Resampling (PSCR), a computational search algorithm to verify if an array is (d, t ¯ )-locating by partitioning the search space to decrease the number of comparisons. If a candidate array is not locating, random resampling is performed until a locating array is constructed or an iteration limit is reached. Algorithmic parameters determine which factor columns to resample and when to add additional tests to the candidate array. We use a 5 × 5 × 3 × 2 × 2 full factorial design to analyze the performance of the algorithmic parameters and provide guidance on how to tune parameters to prioritize speed, accuracy, or a combination of both. Last, we compare our results to the number of tests in locating arrays constructed for the factors and levels of real-world systems produced by other methods. |
Erin Lanus | Breakout |
![]() | 2019 | ||||||||
Accelerating Uncertainty Quantification for Complex Computational Models (Abstract)
Scientific computing has undergone extraordinary growth in sophistication in recent years, enabling the simulation of a wide range of complex multiphysics and multiscale phenomena. Along with this increase in computational capability is the growing recognition that uncertainty quantification (UQ) must go hand-in-hand with numerical simulation in order to generate meaningful and reliable predictions for engineering applications. If not rigorously considered, uncertainties due to manufacturing defects, material variability, modeling assumptions, etc. can cause a substantial disconnect between simulation and reality. Packaging these complex computational models within an UQ framework, however, can be a significant challenge due to the need to repeatedly evaluate the model when even a single evaluation is time-consuming. This talk discusses efforts at NASA Langley Research Center (LaRC) to enable rapid UQ for problems with expensive computational models. Under the High Performance Computing Incubator (HPCI) program at LaRC, several open-source software libraries are being developed and released to provide access to general-purpose, state-of-the-art UQ algorithms. The common denominator of these methods is that they all expose parallelism among the model evaluations needed for UQ and, as such, are implemented to leverage HPC resources when available to achieve tremendous computational speedup. While the methods and software presented are broadly applicable, they will be demonstrated in the context of applications that have particular interest at NASA, including structural health management and trajectory simulation. |
James Warner | Breakout |
![]() | 2019 | ||||||||
When Validation Fails: Analysis of Data from an Imperfect Test Chamber (Abstract)
For chemical/biological testing, test chambers are sometimes designed with a vapor or aerosol homogeneity requirement. For example, a test community may require that the difference in concentration between any two test locations in a chamber be no greater than 20 percent. To validate the chamber, testers must demonstrate that such a requirement is met with a specified amount of certainty, such as 80 percent. With a validated chamber, multiple systems can be simultaneously tested at different test locations with the assurance that each system is exposed to nearly the same concentration. In some cases, however, homogeneity requirements are difficult to achieve. This presentation demonstrates a valid Bayesian method for testing probability of detection as a function of concentration in a chamber that fails to meet a homogeneity requirement. The demonstrated method of analysis is based on recent experience with an actual test chamber. Multiple systems are tested simultaneously at different locations in the chamber. Because systems tested in the chamber are exposed to different concentrations depending on these locations, the differences must be quantified to the greatest extent possible. To this end, data from the failed validation efforts are used to specify informative prior distributions for probability-of-detection modeling. Because these priors quantify and incorporate uncertainty in model parameters, they ensure that the final probability-of-detection model constitutes a valid comparison of the performance of the different systems. |
Kendal Ferguson | Breakout |
![]() | 2019 | ||||||||
Technical Leadership Panel-Tuesday Afternoon |
Dr. Catherine Warner Science Advsior DOT&E |
Breakout | 2016 | |||||||||
Technical Leadership Panel-Tuesday Afternoon |
Paul Roberts Chief Engineer Engineering and Safety Center |
Breakout | 2016 | |||||||||
Technical Leadership Panel-Tuesday Afternoon |
Frank Peri Deputy Director Langley Engineering Directorate |
Breakout | 2016 | |||||||||
Technical Leadership Panel-Tuesday Afternoon |
CAPT Peter Matisoo Technical Director COTF |
Breakout | 2016 | |||||||||
Technical Leadership Panel-Tuesday Afternoon |
Jeff Olinger Technical Director AFOTEC |
Breakout | 2016 | |||||||||
Combining information for Realiability Assesment-Tuesday Morning |
Dr. Alyson Wilson North Carolina State University |
Breakout | materials | 2016 | ||||||||
Reliability Growth in T&E – Summary of National Research Council’s Committee on National Statistics Report Finding-Tuesday Morning |
Dr. Art Fries Research Staff Member IDA |
Breakout | materials | 2016 | ||||||||
Design and Analysis of Margin Testing in Support of Product Qualification for High Reliability Systems |
Dr. Justin Newcomer Sandia National Lab |
Breakout | 2016 | |||||||||
Three Case Studies Comparing Traditional versus Modern Test Designs (Abstract)
There are many testing situations that historically involve a large number of runs. The use of experimental design methods can reduce the number of runs required to obtain the information desired. Example applications include wind tunnel test campaigns, computational experiments and live fire tests. In this work we present three case studies conducted under the auspices of the Science of Test Research Consortium comparing the information obtained via a historical experimental approach with the information obtained via an experimental design approach. The first case study involves a large scale wind tunnel experimental campaign. The second involves a computational fluid dynamics model of a missile through various speeds and angles of attack. The third case involves ongoing live fire testing involve hot surface testing. In each case, results suggest a tremendous opportunity to reduce experimental test efforts without losing test information. |
Dr. Ray Hill Air Force instite of Technology |
Breakout | materials | 2016 | ||||||||
Supersaturated Designs: Construction and Analysis (Abstract)
An important property of any experimental design is its ability to detect active factors. For supersaturated designs, in which model parameters outnumber experimental runs, power is even more critical. In this talk, we review several popular supersaturated design construction criteria and analysis methods. We then demonstrate how simulation studies can be useful for practitioners in selecting a supersaturated design with regards to power to detect active factors. One of our findings based on an extensive simulation study is that although differences clearly exist among analysis methods, most supersaturated design construction methods are indistinguishable in terms of power. This conclusion can be reassuring for practitioners as supersaturated designs can then be sensibly chosen based upon convenience. For instance, the Bayesian D-optimal supersaturated designs can be easily constructed in JMP and SAS for any run size and number of factors. On the other hand, software for constructing E(s2)-optimal supersaturated designs is not as accessible. |
Dr. David Edwards Virginia Commonwealth University |
Breakout | 2016 | |||||||||
Statistically Defensible Experiment Design for Wind Tunnel Characterization of Subscale Parachutes for Mission to Mars (Abstract)
https://s3.amazonaws.com/workshop-archives-2016/IDA+Workshop+2016/testsciencemeeting.ida.org/pdfs/1b-ExperimentalDesignMethodsandApplications.pdf |
Dr. Drew Landman Old Dominion Univerity |
Breakout | materials | 2016 | ||||||||
Managing Uncertainty in the Context of Risk Acceptance Decision Making at NASA: Thinking Beyond the Model (Abstract)
NASA has instituted requirements for establishing Agency-level safety thresholds and goals that define “long-term targeted and maximum tolerable levels of risk to the crew as guidance to developers in evaluating ‘how safe is safe enough’ for a given type of mission.” With the adoption of this policy for human space flight and with ongoing Agency efforts to increase formality in the development and review of the basis for risk acceptance, the decision-support demands placed on risk models are becoming more stringent at NASA. While these models play vital roles in informing risk acceptance decisions, they are vulnerable to incompleteness of risk identification, as well as to incomplete understanding of probabilities of occurrence, potentially leaving a substantial portion of the actual risk unaccounted for, especially for new systems. This presentation argues that management of uncertainty about the “actual” safety performance of a system must take into account the contribution of unknown and/or underappreciated (UU) risk. Correspondingly, responsible risk-acceptance decision-making requires the decision-maker to think beyond the model and address factors (e.g., organizational and management factors) that live outside traditional engineering risk models. This presentation advocates the use of a safety-case approach to risk acceptance decision-making. |
Dr. Homayoon Dezfuli NASA |
Breakout | materials | 2016 | ||||||||
Acceptability of Radiation Detection Systems (Abstract)
The American National Standards Institute (ANSI) maintains a set of test standards that provide methods to characterize and determine the acceptability of radiation detection systems for use in Homeland security. With a focus on the environmental, electromagnetic, and mechanical functionality tests, we describe the test formulation and discuss challenges faced in administering the standard to include the assurance of comparable evaluations across multiple test facilities and the handling of systems that provide a non-standard, unit-less response. We present proposed solutions to these difficulties that are currently being considered in updated versions of the ANSI standards. We briefly describe a decision analytic approach that could allow for the removal of minimum performance requirements from the standards and enable the end user to determine system suitability based on operation-specific requirements. |
Dr. Dennis Leber NIST |
Breakout | materials | 2016 | ||||||||
Risk Analysis for Orbital Debris Wire Harness Failure Assessment for the Joint Polar Satellite System (Abstract)
This paper presents the results of two hypervelocity impact failure probabilistic risk assessments for critical wire bundles exposed aboard the Joint Polar Satellite System (JPSS-1) to an increased orbital debris environment at its 824 km, 98.8 deg inclination orbit. The first “generic” approach predicted the number of wires broken by orbital debris ejecta emerging from normal impact with multi-layer insulation (MLI) covering 36-, 18-, and 6-strand wire bundles at a 5 cm standoff using the Smooth Particle Hydrodynamic (SPH) code. This approach also included a mathematical approach for computing the probability that redundant wires were impacted then severed within the bundle. Based in part on the high computed risk of a critical wire bundle failure from the generic approach, an enhanced orbital debris protection design was examined, consisting of betacloth-reinforced MLI suspended at a 5 cm standoff over a seven layer betacloth and Kevlar blanket, draped over the exposed wire bundles. A second SPH-based risk assessment was conducted that also included the beneficial effects from the high (75 degree) obliquity of orbital debris impact and shadowing by other spacecraft components, and resulted in a considerably reduced likelihood of critical wire bundle failure compared to the original baseline design. |
Dr. Joel Williamsen IDA |
Breakout | materials | 2016 | ||||||||
Application of Statistical Engineering to Mixture Problems With Process Variables (Abstract)
Statistical engineering has been defined as: “The study of how to best utilize statistical concepts, methods, and tool, and integrate them with IT and other relevant sciences to generate improved results.” A key principle is that significant untapped benefits are often achievable by integrating multiple methods in novel ways to address a problem, without having to invent new statistical techniques. In this presentation, we discuss the application of statistical engineering to the problem of design and analysis of mixture experiments when process variables are also involved. In such cases, models incorporating interaction between the mixture and process variables have been developed, but tend to require large designs and models. By considering models nonlinear in the parameters, also well known in the literature, we demonstrate how experimenters can utilize an alternative, iterative strategy in attacking such problems. We show that this strategy potentially saves considerable experimental time and effort, while producing models that are nearly as accurate as much larger linear models. These results are illustrated using two published data sets and one new data set, all involving interactive mixture and process variable problems. |
Dr. Roger Hoerl Union College |
Breakout | materials | 2016 | ||||||||
The Sixth Sense: Clarity through Statistical Engineering (Abstract)
Two responses to an expensive, time consuming test on a final product will be referred to as “adhesion” and “strength”. A screening test was performed on compounds that comprise the final product. These screening tests are multivariate profile measurements. Previous models to predict the expensive, time consuming test lacked accuracy and precision. Data visualization was used to guide a statistical engineering model that makes use of multiple statistical techniques. The modeling approach raised some interesting statistical questions for partial least square models regarding over-fitting and cross validation. Ultimately, the model interpretation and the visualization both make engineering sense and led to interesting insights regarding the product development program and screening compounds. |
Dr. Jennifer Van-Mullekom DuPont |
Breakout | 2016 | |||||||||
Case Studies for Statistical Engineering Applied to Powered Rotorcraft Wind-Tunnel Tests (Abstract)
Co-Authors: Sean A. Commo, Ph.D., P.E. and Peter A. Parker, Ph.D., P.E. NASA Langley Research Center, Hampton, Virginia, USA Austin D. Overmeyer, Philip E. Tanner, and Preston B. Martin, Ph.D. U.S. Army Research, Development, and Engineering Command, Hampton, Virginia, USA. The application of statistical engineering to helicopter wind-tunnel testing was explored during two powered rotor entries. The U.S. Army Aviation Development Directorate Joint Research Program Office and the NASA Revolutionary Vertical Lift Project performed these tests jointly at the NASA Langley Research Center. Both entries were conducted in the 14- by 22-Foot Subsonic Tunnel with a small segment of the overall tests devoted to developing case studies of a statistical engineering approach. Data collected during each entry were used to estimate response surface models characterizing vehicle performance, a novel contribution of statistical engineering applied to powered rotor-wing testing. Additionally, a 16- to 47-times reduction in the number of data points required was estimated when comparing a statistically-engineered approach to a conventional one-factor-at-a-time approach. |
Dr. Sean Commo NASA |
Breakout | 2016 | |||||||||
Bayesian Adaptive Design for Conformance Testing with Bernoulli Trials (Abstract)
Co-authors: Adam L. Pintar, Blaza Toman, and Dennis Leber. A task of the Domestic Nuclear Detection Office (DNDO) is the evaluation of radiation and nuclear (rad/nuc) detection systems used to detect and identify illicit rad/nuc materials. To obtain estimated system performance measures, such as probability of detection, and to determine system acceptability, the DNDO sometimes conduct large scale field tests of these systems at great cost. Typically, non adaptive designs are employed where each rad/nuc test source is presented to each system under test a predetermined and fixed number of times. This approach can lead to unnecessary cost if the system is clearly acceptable or unacceptable. In this presentation, an adaptive design with Bayesian decision theoretic foundations is discussed as an alternative to, and contrasted with, the more common single stage design. Although the basis of the method is Bayesian decision theory, designs may be tuned to have desirable type I and II error rates. While the focus of the presentation is a specific DNDO example, the method is applicable widely. Further, since constructing the designs is somewhat compute intensive, software in the form of an R package will be shown and is available upon request. |
Dr. Adamn Pintar NIST |
Breakout | materials | 2016 | ||||||||
Statistical Models for Combining Information: Styrker Reliability Case Study |
Dr. Rebecca Dickinson IDA |
Breakout | materials | 2016 | ||||||||
Bayesian Estimation of Reliability Growth |
Dr. Jim Brownlow U.S. Air Force 812TSS/ENT |
Breakout | materials | 2016 | ||||||||
Recent use of Statistical Methods in NASA Aeroscience Testing Research and Development Activities (Abstract)
Over the past 10 years, a number of activities have incorporated statistical methods for the purpose of database development and associated uncertainty modeling. This presentation will highlight approaches taken in aerodynamic database development for space vehicle projects, specifically the Orion spacecraft and abort vehicle, and the Space Launch System (SLS) launch vehicle. Additionally, statistical methods have been incorporated into the Commercial Supersonic Transport Project for test technique development and optimization as well as a certification prediction methodology, which is planned to be verified with the Low-Boom Flight Demonstrator data. Discussion will conclude with the use of statistical methods for quality control and assurance in the NASA Langley Research Center ground testing facilities related to our Check Standard Project and characterization and calibration testing. |
Dr. Eric Walker NASA |
Breakout | 2016 | |||||||||
“High Velocity Analytics for NASA JPL Mars Rover Experimental Design” (Abstract)
Rigorous characterization of system capabilities is essential for defensible decisions in test and evaluation (T&E). Analysis of designed experiments is not usually associated “big” data analytics as there are typically a modest number of runs, factors, and responses. The Mars Rover program has recently conducted several disciplined DOEs on prototype coring drill performance with approximately 10 factors along with scores of responses and hundreds of recorded covariates. The goal is to characterize the ‘atthis-time’ capability to confirm what the scientists and engineers already know about the system, answer specific performance and quality questions across multiple environments, and inform future tests to optimize performance. A ‘rigorous’ characterization required that not just one analytical path should be taken, but a combination of interactive data visualization, classic DOE analysis screening methods, and newer methods from predictive analytics such as decision trees. With hundreds of response surface models across many test series and qualitative factors, these methods used had to efficiently find the signals hidden in the noise. Participants will be guided through an end-to-end analysis workflow with actual data from many tests (often Definitive Screening Designs) of the Rover prototype coring drill. We will show data assembly, data cleaning (e.g. missing values and outliers), data exploration with interactive graphical designs, variable screening, response partitioning, data tabulation, model building with stepwise and other methods, and model diagnostics. Software packages such as R and JMP will be used. |
Dr. Jim Wisnowski Co-founder/Principle Adsurgo (bio)
James Wisnowski provides training and consulting services in Design of Experiments, Predictive Analytics, Reliability Engineering, Quality Engineering, Text Mining, Data Visualization, and Forecasting to government and industry. Previously, he spent a career in analytics for the government. He retired from the Air Force having had leadership positions at the Pentagon, Air Force Academy, Air Force Operational Test and Evaluation Center, and units across the Air Force. He has published numerous papers in technical journals and presented several invited conference presentations. He was co-author of the Design and Analysis of Experiments by Douglas Montgomery: A Supplement for using JMP. |
Breakout | materials | 2016 | ||||||||
“High Velocity Analytics for NASA JPL Mars Rover Experimental Design” (Abstract)
Rigorous characterization of system capabilities is essential for defensible decisions in test and evaluation (T&E). Analysis of designed experiments is not usually associated “big” data analytics as there are typically a modest number of runs, factors, and responses. The Mars Rover program has recently conducted several disciplined DOEs on prototype coring drill performance with approximately 10 factors along with scores of responses and hundreds of recorded covariates. The goal is to characterize the ‘atthis-time’ capability to confirm what the scientists and engineers already know about the system, answer specific performance and quality questions across multiple environments, and inform future tests to optimize performance. A ‘rigorous’ characterization required that not just one analytical path should be taken, but a combination of interactive data visualization, classic DOE analysis screening methods, and newer methods from predictive analytics such as decision trees. With hundreds of response surface models across many test series and qualitative factors, these methods used had to efficiently find the signals hidden in the noise. Participants will be guided through an end-to-end analysis workflow with actual data from many tests (often Definitive Screening Designs) of the Rover prototype coring drill. We will show data assembly, data cleaning (e.g. missing values and outliers), data exploration with interactive graphical designs, variable screening, response partitioning, data tabulation, model building with stepwise and other methods, and model diagnostics. Software packages such as R and JMP will be used. |
Dr. Heath Rushing Co-founder/Principle Adsurgo (bio)
Heath Rushing is the cofounder of Adsurgo and author of the book Design and Analysis of Experiments by Douglas Montgomery: A Supplement for using JMP. Previously, he was the JMP Training Manager at SAS, a quality engineer at Amgen, an assistant professor at the Air Force Academy, and a scientific analyst for OT&E in the Air Force. In addition, over the last six years, he has taught Science of Tests (SOT) courses to T&E organizations throughout the DoD. |
Breakout | materials | 2016 | ||||||||
An Engineer and a Statistician Walk into a Bar: A Statistical Engineering Negotiation (Abstract)
A question about whether how post-processing affects the flammability of a given metal an oxygen enriched environment could be answered with the sensitivity of a standard test that is typically used to measure differences in flammability between different metals. The principal investigator was familiar with design of experiments (DOE) and wanted to optimize the value of the information to be gained. This talk will focus on the interchange between the engineer and a statistician. Their negotiations will illustrate the process of clarifying the problem, planning the test including choosing factors and responses, running the experiment and analyzing and reporting on the data. It will focus on the choices made to help squeeze extra knowledge from each test run and leverage statistics to result in increased engineering insight. |
Mr. Ken Johnson NASA |
Breakout | materials | 2016 | ||||||||
Making Statistically Defensible Testing ‘The way we do things around here’ (Abstract)
For the past 7 years, USAF DT&E has been exploring ways to adapt the principles of experimental design to rapidly evolving developmental test articles and test facilities – often with great success. This paper discusses three case studies that span the range of USAF DT&E activities from EW to Ground Test to Flight Test and shows the truly revolutionary impact Fisher’s DOE can have on development. The Advanced Strategic and Tactical Expendable (ASTE) testing began in 1990 to develop, enhance, and test new IR flares, flare patterns, and dispense tactics. More than 60 aircraft & flare types have been tested. The typical output is a “Pancake Plot” of 200+ “cases” of flare, aspect angle, range, elevation, and flare effectiveness using a stop-light chart (red-yellow-green) approach. In usual testing – ~3000 flare engagements costing $1M to participate. The response, troublingly enough, is binary – 15-30 binomial response trials measuring P(Decoy). Binary responses are information-poor. Legacy testing does not assess present statistical power in reporting P(decoy) results. Analysts investigated replacing P(Decoy) w/ continuous metrics – e.g. time to decoy. This research is ongoing. We found we could spread the replicates out to examine 3x to 5x more test conditions without affecting power materially. Analysis with the Generalized Linear Model (GLZ) replaced legacy “cases” analysis with 75% improvement to confidence intervals with same data. We are seeking to build a Monte-Carlo simulation to estimate how many runs are required in a logistics regression model to achieve adequate power. We hope to reduce customer expenditures for flare information by as much as 50%. Co-authors J. Higdon, B, Knight. AEDC completed a major upgrade with new nozzle hardware to vary Mach. The Arnold Transonic Wind Tunnel 4T spans the range of flows from subsonic to approximately M9. The new wind tunnel was to be computer controlled. A number of key instrumentation improvements were made at the same time. The desire was to calibrate the resulting modified tunnel. The last calibration of 4T was 25 years ago in 1990. The calibration ranged across the full range of Mach and pressure capabilities, spanning a four-D space: pressure, Mach, wall angle, and wall porosity. Both the traditional OFAT effort – vary one factor at a time – and a parallel DOE effort were run to compare design, execution, modeling, and prediction capabilities against cost and time to run. The robust embedded face-centered CCD DOE design (J. Simpson and D. Landman ’05) employed 75 vs. 176 OFAT runs. The RSM design achieved 57% run savings. Due to an admirable discipline in randomization during the DOE trials, the smaller design required longer to run. As a result of using the DOE approach, engineers found it easier predict offcondition tunnel operating characteristics using RSM models, optimize facility flow quality for any given test condition. In addition, the RSM regression models support future “spot check” calibration in future by comparing predictions to measured values. If the measurement falls within the prediction interval, the existing calibration still appropriate. AEDC is using split-plot style designs a for current wind tunnel probe calibration. Co-author: Dr Dough Garrard. |
Mr. Greg Hutto Air Force 96th Test Wing |
Breakout | 2016 | |||||||||
Using Sequential Testing to Address Complex Full-Scale Live Fire Test and Evaluation (Abstract)
Co-authors: Dr. Darryl Ahner, Director, STAT COE Dr. Lenny Truett, STAT COE Mr. Scott Wacker, 96 TG/OL-ACS. This presentation will present the benefits of sequential testing and demonstrate how sequential testing can be used to address complex test conditions by developing well controlled early experiments to explore basic questions before proceeding to full-scale testing. This approach can result in increased knowledge and decreased cost. As of FY13 the Air Force had spent an estimated $47M on dry bay fire testing making fire the largest cost contributor for Live Fire Test and Evaluation (LFT&E) programs. There is currently an estimated 60% uncertainty in total platform vulnerable area (Av) driven by probability of kill (PK) due to ballistically ignited fires. A large part of this uncertain comes from the fact that current spurt modeling does not predict fuel spurt delay with reasonable accuracy despite a large amount of test data. A low-cost sequential approach was developed to improve spurts models. Initial testing used a spherical projectile to test 10 different factors in a definitive screening design. Once the list of factors was refined, a second phase of testing determined if a suitable methodology could be developed to scaled results using water as a surrogate for JP-8 fuel. Finally testing was performed with cubical projectiles to evaluate the effect of fragment orientation. The entire cost for this effort was less than one or two typical full-scale live fire tests. |
Dr. Darryl Ahner AFIT, STAT COE |
Breakout | 2016 | |||||||||
Aerospace Measurement and Experimental System Development Characterization (Abstract)
Co-Authors: Sean A. Commo, Ph.D., P.E. and Peter A. Parker, Ph.D., P.E. NASA Langley Research Center, Hampton, Virginia, USA Austin D. Overmeyer, Philip E. Tanner, and Preston B. Martin, Ph.D. U.S. Army Research, Development, and Engineering Command, Hampton, Virginia, USA. The application of statistical engineering to helicopter wind-tunnel testing was explored during two powered rotor entries. The U.S. Army Aviation Development Directorate Joint Research Program Office and the NASA Revolutionary Vertical Lift Project performed these tests jointly at the NASA Langley Research Center. Both entries were conducted in the 14- by 22-Foot Subsonic Tunnel with a small segment of the overall tests devoted to developing case studies of a statistical engineering approach. Data collected during each entry were used to estimate response surface models characterizing vehicle performance, a novel contribution of statistical engineering applied to powered rotor-wing testing. Additionally, a 16- to 47-times reduction in the number of data points required was estimated when comparing a statistically-engineered approach to a conventional one-factor-at-a-time approach. |
Mr. Ray Rhew NASA |
Breakout | materials | 2016 | ||||||||
Leveraging Design for Variation to Improve both Testing and Design – A Case Study on Probabilistic Design of Bearings (Abstract)
In this case study we demonstrate an application of Pratt & Whitney’s “Design for Variation” discipline applied to the task of a roller bearing design. The ultimate goal, in this application, was to utilize test data from the “real world” to calibrate a computer model used for design and ensure that roller bearing designs obtained from this model were optimized for maximum robustness to major sources of variation in bearing manufacture and operation. The “Design for Variation” process provides engineers with many useful analysis results even before real world data is applied: high fidelity sensitivity analysis, uncertainty analysis (quantifying a baseline risk of failing to meet design intent) and model verification. The combining of real world data and Bayesian statistical methods that Design for Variation employs to calibrate models, however, goes a step further, validating the accuracy of the model’s outputs and quantifying any bias between the model and the real world. As a result of this application, the designers were able to identify the sources of the bias and correct the model’s physics-based aspects to more accurately model reality. The improved model is now integrated into all successive bearing design activities. The benefits of this method, which only required a small amount of high quality test data, are now available to all present and future roller bearing designs. |
Dr. Jaime O’connell Pratt and Whitney |
Breakout | materials | 2016 | ||||||||
Importance of Modeling and Simulation in Testing and the Need for Rigorous Validation (Abstract)
Modeling and simulation (M&S) is often an important element of operational evaluations of effectiveness, suitability, survivability, and lethality. For example, the testing of new systems designed to operate against advanced foreign threats, as well as the testing of systems of systems, will involve the use of M&S to examine scenarios that cannot be created using live testing. In order to have an adequate understanding of, and confidence in, the results obtained from M&S, statistically rigorous techniques should be applied to the validation process wherever possible. Design of experiments methodologies should be employed to determine what live and simulation data are needed to support rigorous validation, and formal statistical tests should be used to compare live and simulated data. This talk will discuss the importance of M&S in operational testing through a few examples, and outline several statistically rigorous techniques for validation. |
Dr. Kelly McGinnity IDA |
Breakout | materials | 2016 | ||||||||
Operational Cybersecurity Testing (Abstract)
The key to acquiring a cybersecure system is the ability to drive considerations about security from the operational level into the tactics and procedures. For T&E to support development and acquisition decisions, we must also adopt the perspective that a cyberattack is an attack on the mission using technology. A well-defined process model linking tools, tasks, and operators to mission performance supports this perspective. We will discuss an approach based on best practices learned from various DHS programs. |
Mr. Alex Hoover DHS |
Breakout | materials | 2016 | ||||||||
Dr. Tye Botting Research Staff Member IDA |
Breakout | 2016 | ||||||||||
Recent Advances in Measuring Display Clutter (Abstract)
Display clutter has been defined as an unintended effect of display imagery obscuring or confusing other information or that may not be relevant to the task at hand. Negative effects of clutter on user performance have been documented; however, some work suggests differential effects with workload variations and measurement method. Existing measures of clutter either focus on physical display characteristics or user perceptions and they generally exhibit weak correlations with task performance, limiting utility for application in safety-critical domains. These observations have led to a new integrated measure of clutter accounting for display data, user knowledge and patterns of visual attention. Due to limited research on clutter effects in domains other than aviation, empirical studies have been conducted to evaluate the new measure in automobile driving. Data-driven measures and subjective perceptions of clutter were collected along with patterns of visual attention allocation when drivers searched ‘high’ and ‘low’ clutter navigation displays. The experimental paradigm was manipulated to include both presentation-based trials with static display images or use of a dynamic, driving simulator. The new integrated measure was more strongly correlated with driver performance than other, previously-developed measures of clutter. Results also revealed clutter to significantly alter attention and degrade performance with static displays but to have little to no effects in driving simulation. Findings corroborate trends in the literature that clutter has its greatest effects on behavior in domains requiring extended attention to displays, such as map-search, compared to use of displays to support secondary tasks, such as nav aids in driving. Integrating display data and user knowledge factors with patterns of attention shows promise for clutter measurement |
Dr. David Kaber NCSU |
Breakout | materials | 2017 | ||||||||
Recent Advances in Measuring Display Clutter (Abstract)
Display clutter has been defined as an unintended effect of display imagery obscuring or confusing other information or that may not be relevant to the task at hand. Negative effects of clutter on user performance have been documented; however, some work suggests differential effects with workload variations and measurement method. Existing measures of clutter either focus on physical display characteristics or user perceptions and they generally exhibit weak correlations with task performance, limiting utility for application in safety-critical domains. These observations have led to a new integrated measure of clutter accounting for display data, user knowledge and patterns of visual attention. Due to limited research on clutter effects in domains other than aviation, empirical studies have been conducted to evaluate the new measure in automobile driving. Data-driven measures and subjective perceptions of clutter were collected along with patterns of visual attention allocation when drivers searched ‘high’ and ‘low’ clutter navigation displays. The experimental paradigm was manipulated to include both presentation-based trials with static display images or use of a dynamic, driving simulator. The new integrated measure was more strongly correlated with driver performance than other, previously-developed measures of clutter. Results also revealed clutter to significantly alter attention and degrade performance with static displays but to have little to no effects in driving simulation. Findings corroborate trends in the literature that clutter has its greatest effects on behavior in domains requiring extended attention to displays, such as map-search, compared to use of displays to support secondary tasks, such as nav aids in driving. Integrating display data and user knowledge factors with patterns of attention shows promise for clutter measurement |
Dr. Carl Pankok | Breakout | materials | 2017 | ||||||||
Trust in Automation (Abstract)
This brief talk will focus on the process of human-machine trust in context of automated intelligence tools. The trust process is multifaceted and this talk will define concepts such as trust, trustworthiness, trust behavior, and will examine how these constructs might be operationalized in user studies. The talk will walk through various aspects of what might make an automated intelligence tool more or less trustworthy. Further, the construct of transparency will be discussed as a mechanism to foster shared awareness and shared intent between humans and machines. |
Dr. Joseph Lyons Technical Advisor Air Force Research Laboratory |
Breakout | materials | 2017 | ||||||||
Dose-Response Model of Recent Sonic Boom Community Annoyance Data (Abstract)
To enable quiet supersonic passenger flight overland, NASA is providing national and international noise regulators with a low-noise sonic boom database. The database will consist of dose-response curves, which quantify the relationship between low-noise sonic boom exposure and community annoyance. The recently-updated international standard for environmental noise assessment, ISO 1996-1:2016, references multiple fitting methods for dose-response analysis. One of these fitting methods, Fidell’s community tolerance level method, is based on theoretical assumptions that fix the slope of the curve, allowing only the intercept to vary. This fitting method is applied to an existing pilot sonic boom community annoyance data set from 2011 with a small sample size. The purpose of this exercise is to develop data collection and analysis recommendations for future sonic boom community annoyance surveys. |
Dr. Jonathan Rathsam NASA |
Breakout | 2017 | |||||||||
Overview of Statistical Validation Tools (Abstract)
When Modeling and Simulation (M&S) is used as part of operational evaluations of effectiveness, suitability, survivability, or lethality, the M&S capability should first be rigorously validated to ensure it is representing the real world accurately enough for the intended use. Specifically, we need to understand and characterize the usefulness and limitations of the M&S, especially in terms of uncertainty! Many statistical techniques are available to compare M&S output with live test data. This presentation will describe and present results from a simulation study conducted to determine which techniques provide the highest statistical power to detect differences in mean and variance between live and sim for a variety of data types and sizes. |
Dr. Kelly McGinnity IDA |
Breakout | materials | 2017 | ||||||||
Improving the Rigor of Navy M&S VV&A through the application of Design of Experiments Methodologies and related Statistical Techniques |
Dr. Stargel Doane COTF |
Breakout | materials | 2017 | ||||||||
Validation of AIM-9X Modeling and Simulation (Abstract)
One use for Modeling and Simulation (M&S) in Test and Evaluation (T&E) is to produce weapon miss distances to evaluate the effectiveness of a weapons. This is true for the Air Intercept Missile-9X (AIM-9X) T&E community. Since flight testing is expensive, the test program uses relatively few flight tests at critical conditions, and supplements those data with large numbers of miss distances from simulated tests across the weapons operational space. However, before the model and simulation is used to predict performance it must first be validated. Validation is an especially daunting task when working with a limited number of live test data. In this presentation we shown that even with a limited number of live test points (e.g. 16 missile fires), we can still perform a statistical analysis for the validation. Specifically, we introduce a validation technique known as Fisher’s Combined Probability Test and we show how to apply Fisher’s test to validate the AIM-9X model and simulation. |
Dr. Rebecca Dickinson Research Staff Member IDA |
Breakout | materials | 2017 | ||||||||
Software Reliability Modeling (Abstract)
Many software reliability models characterize the number of faults detected during the testing process as a function of testing time, which is performed over multiple stages. Typically, the later stages are progressively more expensive because of the increased number of personnel and equipment required to support testing as the system nears completion. Such transitions from one stage of testing to the next change in the operational environment. One statistical approach to combine software reliability growth models in a manner capable of characterizing multi-stage testing is the concept of a change-point process, where the intensity of a process experiences a distinct change at one or more discrete times during testing. Thus, change-point processes can be used to model change in the failure rate of software due to changes in the testing strategy and environment, integration testing, and resource allocation as it proceeds through multiple stages of testing. This presentation generalizes change-point models to the heterogeneous case, where fault detection before and after a change-point can be characterized by distinct nonhomogeneous Poisson processes (NHPP). Experimental results suggest that heterogeneous change-point models better characterize some failure data sets, which can improve the applicability of software reliability models to large-scale software systems that are tested over multiple stages. |
Dr. Lance Fiondella UMASS |
Breakout | materials | 2017 | ||||||||
Reliability Growth Modeling (Abstract)
Several optimization models are described for allocating resources to different testing activities in a system’s reliability growth program. These models assume availability of an underlying reliability growth model for the system, and capture the tradeoffs associated with focusing testing resources at various levels (e.g., system, subsystem, component) and/or how to divide resources within a given level. In order to demonstrate insights generated by solving the model, we apply the optimization models to an example series-parallel system in which reliability growth is assumed to follow the Crow/AMSAA reliability growth model. We then demonstrate how the optimization models can be extended to incorporate uncertainty in Crow/AMSAA parameters. |
Dr. Kellly Sullivan University of Arkansas |
Breakout | materials | 2017 | ||||||||
Updating R and Reliability Training with Bill Meeker (Abstract)
Since its publication, Statistical Methods for Reliability Data by W. Q. Meeker and L. A. Escobar has been recognized as a foundational resource in analyzing failure time to and survival data. Along with the text, the authors provided an S-Plus software package, called SPLIDA, to help readers utilize the methods presented in the text. Today, R is the most popular statistical computing language in the world, largely supplanting S-Plus. The SMRD package is the result of a multi-year effort to completely rebuild SPLIDA, to take advantage of the improved graphics and workflow capabilities available in R. This presentation introduces the SMRD package, outlines the improvements and shows how the package works seamlessly with the rmarkdown and shiny packages to dramatically speed up your workflow. The presentation concludes with a discussion on what improvements still need to be made prior to publishing the package on the CRAN. |
Dr. Jason Freels AFIT |
Breakout | materials | 2017 | ||||||||
Introduction to Bayesian Statistics (Abstract)
One of the most powerful features of Bayesian analyses is the ability to combine multiple sources of information in a principled way to perform inference. For example, this feature can be particularly valuable in assessing the reliability of systems where testing is limited for some reason (e.g., expense, treaty). At their most basic, Bayesian methods for reliability develop informative prior distributions using expert judgment or similar systems. Appropriate models allow the incorporation of many other sources of information, including historical data, information from similar systems, and computer models. I will introduce the approach and then consider examples from defense acquisition and lifecycle extension, focusing on the strengths and weaknesses of the Bayesian analyses. |
Dr. Alyson Wilson North Carolina State University |
Breakout | materials | 2017 | ||||||||
How do the Framework and Design of Experiments Fundamentally Help? (Abstract)
The Military Global Positioning System (GPS) User Equipment (MGUE) program is the user segment of the GPS Enterprise—a program on the Deputy Assistant Secretary of Defense for Developmental Test and Evaluation (DASD(DT&E)) Space and Missile Defense Systems portfolio. The MGUE program develops and test GPS cards capable of using Military-Code (M Code) and legacy signals. The program’s DT&E strategy is challenging. The GPS cards provide new, untested capabilities. Milestone A was approved on 2012 with sole source contracts released to three vendors for Increment 1. An Acquisition Decision Memorandum directs the program to support a Congressional Mandate to provide GPS M Code-capable equipment for use after FY17. Increment 1 provides GPS receiver form factors for the ground domain interface as well as for the aviation and maritime domain interface. When reviewing DASD(DT&E) Milestone B (MS B) Assessment Report, Mr. Kendall expressed curiosity about how the Developmental Evaluation Framework (DEF) and Design of Experiments (DOE) help. This presentation describes how the DEF and DOE methods help producing more informative and more economical developmental tests than what was originally under consideration by the test community—decision-quality information with a 60% reduction in test cycle time. It provides insight into how the integration of the DEF and DOE improved the overall effectiveness of the DT&E strategy, illustrates the role of modeling and simulation (M&S) in the test design process, provides examples of experiment designs for different functional and performance areas, and illustrates the logic involved in balancing risks and test resources. The DEF and DOE methods enables the DT&E strategy to fully exploit early discovery, to maximize verification and validation opportunities, and to characterize system behavior across the technical requirements space. |
Dr. Luis A. Cortes | Breakout | 2017 | |||||||||
How do the Framework and Design of Experiments Fundamentally Help? (Abstract)
The Military Global Positioning System (GPS) User Equipment (MGUE) program is the user segment of the GPS Enterprise—a program on the Deputy Assistant Secretary of Defense for Developmental Test and Evaluation (DASD(DT&E)) Space and Missile Defense Systems portfolio. The MGUE program develops and test GPS cards capable of using Military-Code (M Code) and legacy signals. The program’s DT&E strategy is challenging. The GPS cards provide new, untested capabilities. Milestone A was approved on 2012 with sole source contracts released to three vendors for Increment 1. An Acquisition Decision Memorandum directs the program to support a Congressional Mandate to provide GPS M Code-capable equipment for use after FY17. Increment 1 provides GPS receiver form factors for the ground domain interface as well as for the aviation and maritime domain interface. When reviewing DASD(DT&E) Milestone B (MS B) Assessment Report, Mr. Kendall expressed curiosity about how the Developmental Evaluation Framework (DEF) and Design of Experiments (DOE) help. This presentation describes how the DEF and DOE methods help producing more informative and more economical developmental tests than what was originally under consideration by the test community—decision-quality information with a 60% reduction in test cycle time. It provides insight into how the integration of the DEF and DOE improved the overall effectiveness of the DT&E strategy, illustrates the role of modeling and simulation (M&S) in the test design process, provides examples of experiment designs for different functional and performance areas, and illustrates the logic involved in balancing risks and test resources. The DEF and DOE methods enables the DT&E strategy to fully exploit early discovery, to maximize verification and validation opportunities, and to characterize system behavior across the technical requirements space. |
Mr. Mike Sheeha MITRE |
Breakout | 2017 | |||||||||
Allocating Information Gathering Efforts for Selection Decisions (Abstract)
Selection decisions, such as procurement decisions, are often based on multiple performance attributes whose values are estimated using data (samples) collected through experimentation. Because the sampling (measurement) process has uncertainty, more samples provide better information. With a limited test budget to collect information to support such a selection decision, determining the number of samples to observe from each alternative and attribute is a critical information gathering decision. In this talk we present a sequential allocation scheme that uses Bayesian updating and maximizes the probability of selecting the true best alternative when the attribute value samples contain Gaussian measurement error. In this sequential approach, the test-designer uses the current knowledge of the attribute values to identify which attribute and alternative to sample next; after that sample, the test-designer chooses another attribute and alternative to sample, and this continues until no more samples can be made. We present the results of a simulation study that illustrates the performance advantage of the proposed sequential allocation scheme over simpler and more common fixed allocation approaches. |
Dr. Dennis Leber NIST |
Breakout | materials | 2017 | ||||||||
Allocating Information Gathering Efforts for Selection Decisions (Abstract)
Selection decisions, such as procurement decisions, are often based on multiple performance attributes whose values are estimated using data (samples) collected through experimentation. Because the sampling (measurement) process has uncertainty, more samples provide better information. With a limited test budget to collect information to support such a selection decision, determining the number of samples to observe from each alternative and attribute is a critical information gathering decision. In this talk we present a sequential allocation scheme that uses Bayesian updating and maximizes the probability of selecting the true best alternative when the attribute value samples contain Gaussian measurement error. In this sequential approach, the test-designer uses the current knowledge of the attribute values to identify which attribute and alternative to sample next; after that sample, the test-designer chooses another attribute and alternative to sample, and this continues until no more samples can be made. We present the results of a simulation study that illustrates the performance advantage of the proposed sequential allocation scheme over simpler and more common fixed allocation approaches. |
Dr. Jeffrey Herrmann University of Maryland |
Breakout | materials | 2017 | ||||||||
Statistical Methods for Programmatic Assessment and Anomaly Detection |
Mr. Douglas Brown BAH |
Breakout | materials | 2017 | ||||||||
Statistical Methods for Programmatic Assessment and Anomaly Detection |
Dr. Ray McCollum BAH |
Breakout | materials | 2017 | ||||||||
Flight Test and Evaluation of Airborne Spacing Application (Abstract)
NASA’s Airspace Technology Demonstration (ATD) project was developed to facilitate the transition of mature air traffic management technologies from the laboratory to operational use. The first ATD focused on an integrated set of advanced NASA technologies to enable efficient arrival operations in high-density terminal airspace. This integrated arrival solution was validated and verified in laboratories and transitioned to a field prototype for an operational demonstration. Within NASA, this was a collaborative effort between Ames and Langley Research Centers involving a multi-year iterative experimentation process consisting of a series of sequential batch computer simulations and human-in-the-loop experiments, culminating in a flight test. Designing and analyzing the flight test involved a number of statistical challenges. There were several variables which are known to impact the performance of the system, but which could not be controlled in an operational environment. Changes in the schedule due to weather and the dynamic positioning of the aircraft on the arrival routes resulted in the need for a design that could be modified in real-time. This presentation describes a case study from a recent NASA flight test, highlights statistical challenges, and discusses lessons learned. |
Dr. Sara Wilson NASA |
Breakout | materials | 2017 | ||||||||
Testing and Estimation in Sequential High-Dimension Data (Abstract)
Many modern processes generate complex data records not readily analyzed by traditional techniques. For example, a single observation from a process might be a radar signal consisting of n pairs of bivariate data described via some functional relation between reflection and direction. Methods are examined here for detecting changes in such sequences from some known or estimated nominal state. Additionally, estimates of the degree of change (scale, location, frequency, etc.) are desirable and discussed. The proposed methods are designed to take advantage of all available data in a sequence. This can become unwieldy for long sequences of large-sized observations, so dimension reduction techniques are needed. In order for these methods to be as widely applicable as possible, we make limited distributional assumptions and so we propose new nonparametric and Bayesian tools to implement these estimators. |
Dr. Eric Chicken Florida State University |
Breakout | materials | 2017 | ||||||||
Blast Noise Event Classification from a Spectrogram (Abstract)
Spectrograms (i.e., squared magnitude of short-time Fourier transform) are commonly used as features to classify audio signals in the same way that social media companies (e.g., Google, Facebook, Yahoo) use images to classify or automatically tag people in photos. However, a serious problem arises when using spectrograms to classify acoustic signals, in that the user must choose the input parameters (hyperparameters), and such choices can have a drastic effect on the accuracy of the resulting classifier. Further, considering all possible combinations of the hyperparameters is a computationally intractable problem. In this study, we simplify the problem making it computationally tractable, explore the utility of response surface methods for sampling the hyperparameter space, and find that response surface methods are a computationally efficient means of identifying the hyperparameter combinations that are likely to give the best classification results. |
Dr. Edward Nykaza Army Engineering Research and Development Center, Construction Engineering Research Laboratory |
Breakout | materials | 2017 | ||||||||
Automated Software Testing Best Practices and Framework: A STAT COE Project (Abstract)
The process for testing military systems which are largely software intensive involves techniques and procedures often different from those for hardware-based systems. Much of the testing can be performed in laboratories at many of the acquisition stages, up to operational testing. Testing software systems is not different from testing hardware-based systems in that testing earlier and more intensively benefits the acquisition program in the long run. Automated testing of software systems enables more frequent and more extensive testing, allowing for earlier discovery of errors and faults in the code. Automated testing is beneficial for unit, integrated, functional and performance testing, but there are costs associated with automation tool license fees, specialized manpower, and the time to prepare and maintain the automation scripts. This presentation discusses some of the features unique to automated software testing and offers a framework organizations can implement to make the business case for, to organize for, and to execute and benefit from automating the right aspects of their testing needs. Automation has many benefits in saving time and money, but is most valuable in freeing test resources to perform higher value tasks. |
Dr. Jim Simpson JK Analytics |
Breakout | materials | 2017 | ||||||||
Automated Software Testing Best Practices and Framework: A STAT COE Project (Abstract)
The process for testing military systems which are largely software intensive involves techniques and procedures often different from those for hardware-based systems. Much of the testing can be performed in laboratories at many of the acquisition stages, up to operational testing. Testing software systems is not different from testing hardware-based systems in that testing earlier and more intensively benefits the acquisition program in the long run. Automated testing of software systems enables more frequent and more extensive testing, allowing for earlier discovery of errors and faults in the code. Automated testing is beneficial for unit, integrated, functional and performance testing, but there are costs associated with automation tool license fees, specialized manpower, and the time to prepare and maintain the automation scripts. This presentation discusses some of the features unique to automated software testing and offers a framework organizations can implement to make the business case for, to organize for, and to execute and benefit from automating the right aspects of their testing needs. Automation has many benefits in saving time and money, but is most valuable in freeing test resources to perform higher value tasks. |
Dr. Jim Wisnowski Adsurgo |
Breakout | materials | 2017 | ||||||||
Software Test Techniques (Abstract)
In recent years, software testing techniques based on formal methods have made their way into industrial practice as a supplement to system and unit testing. I will discuss three core techniques that have proven particularly amenable to transition: 1) Concolic execution, which enables the automatic generation of high-coverage test suites; 2) Property-based randomized testing, which automatically checks sequences of API calls to ensure that expected high-level behavior occurs; and 3) Bounded model checking, which enables systematic exploration of both concrete systems and high-level models to check temporal properties, including ordering of events and timing requirements. |
Dr. Jose Calderon Galois |
Breakout | materials | 2017 | ||||||||
Combinational Testing (Abstract)
Combinatorial methods have attracted attention as a means of providing strong assurance at reduced cost. Combinatorial testing takes advantage of the interaction rule, which is based on analysis of thousands of software failures. The rule states that most failures are induced by single factor faults or by the joint combinatorial effect (interaction) of two factors, with progressively fewer failures induced by interactions between three or more factors. Therefore if all faults in a system can be induced by a combination of t or fewer parameters, then testing all t-way combinations of parameter values is pseudo-exhaustive and provides a high rate of fault detection. The talk explains background, method, and tools available for combinatorial testing. New results on using combinatorial methods for oracle-free testing of certain types of applications will also be introduced |
Dr. Raghu Kacker NIST |
Breakout | materials | 2017 | ||||||||
Combinational Testing (Abstract)
Combinatorial methods have attracted attention as a means of providing strong assurance at reduced cost. Combinatorial testing takes advantage of the interaction rule, which is based on analysis of thousands of software failures. The rule states that most failures are induced by single factor faults or by the joint combinatorial effect (interaction) of two factors, with progressively fewer failures induced by interactions between three or more factors. Therefore if all faults in a system can be induced by a combination of t or fewer parameters, then testing all t-way combinations of parameter values is pseudo-exhaustive and provides a high rate of fault detection. The talk explains background, method, and tools available for combinatorial testing. New results on using combinatorial methods for oracle-free testing of certain types of applications will also be introduced |
Dr. Rick Kuhn NIST |
Breakout | materials | 2017 | ||||||||
Introduction to Human Measurement |
Dr. Cynthia Null NASA |
Breakout | 2017 | |||||||||
The (Empirical) Case for Analyzing Likert-Type Data with Parametric Tests (Abstract)
Surveys are commonly used to evaluate the quality of human-system interactions during the operational testing of military systems. Testers use Likert-type response options to measure the intensity of operators’ subjective experiences (e.g., usability, workload) while operating the system. Recently, appropriate methods for analyzing Likert data have become a point of contention within the operational test community. Some argue that Likert data can be analyzed with parametric techniques whereas others argue that only non-parametric techniques should be used. However, the reasons stated for holding a particular view are rarely tied to findings in the empirical literature. This presentation sheds light on this debate by reviewing existing research on how parametric statistics affect the conclusions drawn from Likert data and debunk common myths and misunderstandings about the nature of Likert data within the operational test community and academia. |
Dr. Heather Wojton Research Staff Member IDA |
Breakout | materials | 2017 | ||||||||
The System Usability Scale: A measurement Instrument Should Suit the Measurement Needs (Abstract)
The System Usability Scale (SUS) was developed by John Brooke in 1986 “to take a quick measurement of how people perceived the usability of (office) computer systems on which they were working.” The SUS is a 10-item, generic usability scale that is assumed to be system agnostic, and it results in a numerical score that ranges from 0-100. It has been widely employed and researched with non-military systems. More recently, it has been strongly recommended for use with military systems in operational test and evaluation, in part because of its widespread commercial use, but largely because it produces a numerical score that makes it amendable to statistical operations. Recent lessons learned with SUS in operational test and evaluation strongly question its use with military systems, most of which differ radically from non-military systems. More specifically, (1) usability measurement attributes need to be tailored to the specific system under test and meet the information needs of system users, and (2) a SUS numerical cutoff score of 70—a common benchmark with non-military systems—does not accurately reflect “system usability” from an operator or test team perspective. These findings will be discussed in a psychological and human factors measurement context, and an example of system-specific usability attributes will be provided as a viable way forward. In the event that the SUS is used in operational test and evaluation, some recommendations for interpreting the outcomes will be provided. |
Mr. Keith Kidder AFOTEC |
Breakout | materials | 2017 | ||||||||
Deterministic System Design of Experiments Based Frangible Joint Design Reliability Estimation (Abstract)
Frangible Joints are linear pyrotechnic devices used to separate launch vehicle and spacecraft stages and fairings. Advantages of these systems include low mass, low dynamic shock, and low debris. However the primary disadvantage for human space flight applications is the design’s use of a single explosive cord to effect function, rendering the device zero fault tolerant. Commercial company proposals to utilize frangible joints in human space flight applications spurred a NASA Engineering and Safety Center (NESC) assessment of the reliability of frangible joints. Empirical test and LS-DYNA based finite element analysis was used to understand and assess the design and function, and a deterministic system Design of Experiments (dsDOE) study was conducted to assess the sensitivity of function to frangible joint design variables and predict the device’s design reliability. The collaboration between statistical engineering experts and LS-DYNA analysis experts enabled a comprehensive understanding of these devices. |
Mr. Scott West Aerospace Corporation |
Breakout | materials | 2017 | ||||||||
Deterministic System Design of Experiments Based Frangible Joint Design Reliability Estimation (Abstract)
Frangible Joints are linear pyrotechnic devices used to separate launch vehicle and spacecraft stages and fairings. Advantages of these systems include low mass, low dynamic shock, and low debris. However the primary disadvantage for human space flight applications is the design’s use of a single explosive cord to effect function, rendering the device zero fault tolerant. Commercial company proposals to utilize frangible joints in human space flight applications spurred a NASA Engineering and Safety Center (NESC) assessment of the reliability of frangible joints. Empirical test and LS-DYNA based finite element analysis was used to understand and assess the design and function, and a deterministic system Design of Experiments (dsDOE) study was conducted to assess the sensitivity of function to frangible joint design variables and predict the device’s design reliability. The collaboration between statistical engineering experts and LS-DYNA analysis experts enabled a comprehensive understanding of these devices. |
Mr. Martin Annett Aerospace Corporation |
Breakout | materials | 2017 | ||||||||
Deterministic System Design of Experiments Based Frangible Joint Design Reliability Estimation (Abstract)
Frangible Joints are linear pyrotechnic devices used to separate launch vehicle and spacecraft stages and fairings. Advantages of these systems include low mass, low dynamic shock, and low debris. However the primary disadvantage for human space flight applications is the design’s use of a single explosive cord to effect function, rendering the device zero fault tolerant. Commercial company proposals to utilize frangible joints in human space flight applications spurred a NASA Engineering and Safety Center (NESC) assessment of the reliability of frangible joints. Empirical test and LS-DYNA based finite element analysis was used to understand and assess the design and function, and a deterministic system Design of Experiments (dsDOE) study was conducted to assess the sensitivity of function to frangible joint design variables and predict the device’s design reliability. The collaboration between statistical engineering experts and LS-DYNA analysis experts enabled a comprehensive understanding of these devices. |
Dr. James Womach Aerospace Corporation |
Breakout | materials | 2017 | ||||||||
Machine Learning: Overview and Applications to Test (Abstract)
“Machine learning is quickly gaining importance in being able to infer meaning from large, high-dimensional datasets. It has even demonstrated performance meeting or exceeding human capabilities in conducting a particular set of tasks such as speech recognition and image recognition. Employing these machine learning capabilities can lead to increased efficiency in data collection, processing, and analysis. Presenters will provide an overview of common examples of supervised and unsupervised learning tasks and algorithms as an introduction to those without experience in machine learning. Presenters will also provide motivation for machine learning tasks and algorithms in a variety of test and evaluation settings. For example, in both developmental and operational test, restrictions on instrumentation, number of sorties, and the amount of time allocated to analyze collected data make data analysis challenging. When instrumentation is unavailable or fails, a common back-up data source is an over-the-shoulder video recording or recordings of aircraft intercom and radio transmissions, which traditionally are tedious to analyze. Machine learning based image and speech recognition algorithms can assist in extracting information quickly from hours of video and audio recordings. Additionally, unsupervised learning techniques may be used to aid in the identification of influences of logged or uncontrollable factors in many test and evaluation settings. Presenters will provide a potential example for the application of unsupervised learning techniques to test and evaluation.” |
Lt. Takayuki Iguchi AFOTEC |
Breakout | materials | 2017 | ||||||||
Machine Learning: Overview and Applications to Test (Abstract)
“Machine learning is quickly gaining importance in being able to infer meaning from large, high-dimensional datasets. It has even demonstrated performance meeting or exceeding human capabilities in conducting a particular set of tasks such as speech recognition and image recognition. Employing these machine learning capabilities can lead to increased efficiency in data collection, processing, and analysis. Presenters will provide an overview of common examples of supervised and unsupervised learning tasks and algorithms as an introduction to those without experience in machine learning. Presenters will also provide motivation for machine learning tasks and algorithms in a variety of test and evaluation settings. For example, in both developmental and operational test, restrictions on instrumentation, number of sorties, and the amount of time allocated to analyze collected data make data analysis challenging. When instrumentation is unavailable or fails, a common back-up data source is an over-the-shoulder video recording or recordings of aircraft intercom and radio transmissions, which traditionally are tedious to analyze. Machine learning based image and speech recognition algorithms can assist in extracting information quickly from hours of video and audio recordings. Additionally, unsupervised learning techniques may be used to aid in the identification of influences of logged or uncontrollable factors in many test and evaluation settings. Presenters will provide a potential example for the application of unsupervised learning techniques to test and evaluation.” |
Lt. Megan Lewis AFOTEC |
Breakout | materials | 2017 | ||||||||
Range Adversarial Planning Tool for Autonomy Test and Evaluation |
Dr. Chad Hawthorne JHU/APL |
Breakout | 2017 | |||||||||
Search for Extended Test Design Methods for Complex Systems of Systems |
Dr. Alex Alaniz AFOTEC |
Breakout | materials | 2017 | ||||||||
The Future of Engineering at NASA Langley (Abstract)
In May 2016, the NASA Langley Research Center’s Engineering Director stood up a group consisting of employees within the directorate to assess the current state of engineering being done by the organization. The group was chartered to develop ideas, through investigation and benchmarking of other organizations within and outside of NASA, for how engineering should look in the future. This effort would include brainstorming, development of recommendations, and some detailed implementation plans which could be acted upon by the directorate leadership as part of an enduring activity. The group made slow and sporadic progress in several specific, self-selected areas including: training and development; incorporation of non-traditional engineering disciplines; capturing and leveraging historical data and knowledge; revolutionizing project documentation; and more effective use of design reviews. The design review investigations have made significant progress by leveraging lessons learned and techniques gained by collaboration with operations research analysts within the local Lockheed Martin Center for Innovation (the “Lighthouse”) and pairing those techniques with advanced data analysis tools available through the IBM Watson Content Analytics environment. Trials with these new techniques are underway but show promising results for the future of providing objective, quantifiable data from the design review environment – an environment which to this point has remained essentially unchanged for the past 50 years. |
Mr. Joe Gasbarre NASA |
Breakout | materials | 2017 | ||||||||
Combinatorial Testing for Link-16 Developmental Test and Evaluation (Abstract)
Due to small Tactical Data Link testing windows, only commonly used messages are tested resulting in the evaluation of only a small subset of all possible Link 16 messages. To increase the confidence that software design and implementation issues are discovered in the earliest phases of government acceptance testing, Marine Corps Tactical Systems Support Activity (MCTSSA) Instrumentation and Data Management Section (IDMS) successfully implemented an extension of the traditional form of Design of Experiments (DOE), called Combinatorial Testing (CT). CT was utilized to reduce the human bias and inconsistencies involved in Link 16 testing and replace them with a thorough test that can validate a system’s ability to properly consume all of the possible valid combinations of Link 16 message field values. MCTSSA’s unique team of subject matter experts was able to bring together the tenants of virtualization, automation, C4I Air systems testing, tactical data link testing, and Design of Experiments methodology, to invent a testing paradigm that will exhaustively evaluate tactical Air systems. This presentation will give an overview of how CT was implemented for the test. |
Mr. Tim Mclean MCTSSA |
Breakout | materials | 2017 | ||||||||
Censored Data Analysis for Performance Data (Abstract)
Binomial metrics like probability-to-detect or probability-to-hit typically provide operationally meaningful and easy to interpret test outcomes. However, they are information poor metrics and extremely expensive to test. The standard power calculations to size a test employ hypothesis tests, which typically result in many tens to hundreds of runs. In addition to being expensive, the test is most likely inadequate for characterizing performance over a variety of conditions due to the inherently large statistical uncertainties associated with binomial metrics. A solution is to convert to a continuous variable, such as miss distance or time-to-detect. The common objection to switching to a continuous variable is that the hit/miss or detect/non-detect binomial information is lost, when the fraction of misses/no-detects is often the most important aspect of characterizing system performance. Furthermore, the new continuous metric appears to no longer be connected to the requirements document, which was stated in terms of a probability. These difficulties can be overcome with the use of censored data analysis. This presentation will illustrate the concepts and benefits of this approach, and will illustrate a simple analysis with data, including power calculations to show the cost savings for employing the methodology. |
Dr. Bram Lillard IDA |
Breakout | materials | 2017 | ||||||||
Estimating the Distribution of an Extremum using a Peaks-Over-Threshold Model and Monte Carlo Simulation (Abstract)
Estimating the probability distribution of an extremum (maximum or minimum), for some fixed amount of time, using a single time series typically recorded for a shorter amount of time, is important in many application areas, e.g., structural design, reliability, quality, and insurance. When designing structural members, engineers are concerned with maximum wind effects, which are functions of wind speed. With respect to reliability and quality, extremes experienced during storage or transport, e.g., extreme temperatures, may substantially impact product quality, lifetime, or both. Insurance companies are of course concerned about very large claims. In this presentation, a method to estimate the distribution of an extremum using a well-known peaks-over-threshold (POT) model and Monte Carlo simulation is presented. Since extreme values have long been a subject of study, some brief history is first discussed. The POT model that underlies the approach is then laid out. A description of the algorithm follows. It leverages pressure data collected on scale models of buildings in a wind tunnel for context. Essentially, the POT model is fitted to the observed data and then used to simulate many times series of the desired length. The empirical distribution of the extrema is obtained from the simulated series. Uncertainty in the estimated distribution is quantified by a bootstrap algorithm. Finally, an R package implementing the computations is discussed. |
Dr. Adam Pintar NIST |
Breakout | materials | 2017 | ||||||||
Design & Analysis of a Computer Experiment for an Aerospace Conformance Simulation Study (Abstract)
Within NASA’s Air Traffic Management Technology Demonstration # 1 (ATD-1), Interval Management (IM) is a flight deck tool that enables pilots to achieve or maintain a precise in-trail spacing behind a target aircraft. Previous research has shown that violations of aircraft spacing requirements can occur between an IM aircraft and its surrounding non-IM aircraft when it is following a target on a separate route. This talk focuses on the experimental design and analysis of a computer experiment which models the airspace configuration of interest in order to determine airspace/aircraft conditions leading to spacing violations during IM operation. We refer to multi-layered nested continuous factors as those that are continuous and ordered in their selection; they can only be selected sequentially with a level selected for one factor affecting the range of possible values for each subsequently nested factor. While each factor is nested within another factor, the exact nesting relationships have no closed form solution. In this talk, we describe our process of engineering an appropriate space-filling design for this situation. Using this space-filling design and Gaussian process modeling, we found that aircraft delay assignments and wind profiles significantly impact the likelihood of spacing violations and the interruption of IM operations. |
Dr. David Edwards Virginia Commonwealth University |
Breakout | materials | 2017 | ||||||||
Optimal Multi-Response Designs (Abstract)
The problem of constructing a design for an experiment when multiple responses are of interest does not have a clear answer, particularly when the response variables are of different types. Planning an experiment for an air-to-air missile simulation, for example, might have the following responses simultaneously: hit or miss the target (a binary response) and the time to acquire the target (a continuous response). With limited time and resources and only one experiment possible, the question of selecting an appropriate design to model both responses is important. In this presentation, we discuss a method for creating designs when two responses, each with a different distribution (normal, binomial, or Poisson), are of interest. We demonstrate the proposed method using various weighting schemes for the two models to show how the designs change as the weighting scheme changes. In addition, we explore the effect of the specified priors for the nonlinear models on these designs. |
Dr. Sarah Burke STAT COE |
Breakout | 2017 | |||||||||
Augmenting Definitive Screening Designs (Abstract)
Jones and Nachtsheim (2011) introduced a class of three-level screening designs called definitive screening designs (DSDs). The structure of these designs results in the statistical independence of main effects and two-factor interactions; the absence of complete confounding among two-factor interactions; and the ability to estimate all quadratic effects. Because quadratic effects can be estimated, DSDs can allow for the screening and optimization of a system to be performed in one step, but only when the number of terms found to be active during the screening phase of analysis is less than about half the number or runs required by the DSD (Errore, et al., 2016). Otherwise, estimation of second-order models requires augmentation of the DSD. In this paper we explore the construction of series of augmented designs, moving from the starting DSD to designs capable of estimating the full second-order model. We use power calculations, model-robustness criteria, and model-discrimination criteria to determine the number of runs by which to augment in order to identify the active second-order effects with high probability. |
Ms. Abby Nachtsheim ASU |
Breakout | materials | 2017 | ||||||||
Integrated Uncertainty Quantification for Risk and Resource Management: Building Confidence in Design |
Dr. Eric Walker NASA |
Breakout | materials | 2017 | ||||||||
Background of NASA’s Juncture Flow Validation Test |
Mr. Joseph Morrison NASA |
Breakout | 2017 | |||||||||
A Study to Investigate the Use of CFD as a Surrogate for Wind Tunnel Testing in the High Supersonic Speed Regime |
Dr. Eric Walker NASA |
Breakout | materials | 2017 | ||||||||
A Study to Investigate the Use of CFD as a Surrogate for Wind Tunnel Testing in the High Supersonic Speed Regime |
Mr. Joseph Morrison NASA |
Breakout | materials | 2017 | ||||||||
Communicating Complex Statistical Methodologies to Leadership (Abstract)
More often than not, the data we analyze for the military is plagued with statistical issues. Multicollinearity, small sample sizes, quasi-experimental designs, and convenience samples are some examples of what we commonly see in military data. Many of these complications can be resolved either in the design or analysis stage with appropriate statistical procedures. But, to keep our work useful, usable, and transparent to the military leadership who sponsors it, we must strike the elusive balance between explaining and justifying our design and analysis techniques and not inundating our audience with unnecessary details. It can be even more difficult to get military leadership to understand the statistical problems and solutions so well that they are enthused and supportive of our approaches. Using literature written on the subject as well as a variety of experiences, we will showcase several examples, as well as present ideas for keeping our clients actively engaged in statistical methodology discussions. |
Dr. Jane Pinelis Johns Hopkins University Applied Physics Lab or JHU |
Breakout | materials | 2017 | ||||||||
Communicating Complex Statistical Methodologies to Leadership (Abstract)
More often than not, the data we analyze for the military is plagued with statistical issues. Multicollinearity, small sample sizes, quasi-experimental designs, and convenience samples are some examples of what we commonly see in military data. Many of these complications can be resolved either in the design or analysis stage with appropriate statistical procedures. But, to keep our work useful, usable, and transparent to the military leadership who sponsors it, we must strike the elusive balance between explaining and justifying our design and analysis techniques and not inundating our audience with unnecessary details. It can be even more difficult to get military leadership to understand the statistical problems and solutions so well that they are enthused and supportive of our approaches. Using literature written on the subject as well as a variety of experiences, we will showcase several examples, as well as present ideas for keeping our clients actively engaged in statistical methodology discussions. |
Mr. Paul Johnson MCOTEA |
Breakout | materials | 2017 | ||||||||
DOE Case Studies in Aerospace Research and Development (Abstract)
This presentation will provide a high level view of recent DOE applications to aerospace research. Two broad categories are defined, aerodynamic force measurement system calibrations and aircraft model wind tunnel aerodynamic characterization. Each case study will outline the application of DOE principles including design choices, accommodations for deviations from classical DOE approaches, discoveries, and practical lessons learned. Case Studies Aerodynamic Force Measurement System Calibrations Large External Wind Tunnel Balance Calibration – Fractional factorial – Working with non-ideal factor settings – Customer driven uncertainty assessment Internal Balance Calibration Including Temperature – Restrictions to randomization – split plot design requirements – Constraints to basic force model – Crossed design approach Aircraft Model Wind Tunnel Aerodynamic Characterization The NASA/Boeing X-48B Blended Wing Body Low-Speed Wind Tunnel Test – Overcoming a culture of OFAT – General approach to investigating a new aircraft configuration – Use of automated wind tunnel models and randomization – Power of residual analysis in detecting problems NASA GL–10 UAV Aerodynamic Characterization – Use of the Nested-Face Centered Design for aerodynamic characterization – Issues working with over 20 factors – Discoveries |
Dr. Drew Landman Old Dominion University |
Breakout | materials | 2017 | ||||||||
Sample Size and Considerations for Statistical Power (Abstract)
Sample size drives the resources and supports the conclusions of operational test. Power analysis is a common statistical methodology used in planning efforts to justify the number of samples. Power analysis is sensitive to extreme performance (e.g. 0.1% correct responses or 99.999% correct responses) relative to a threshold value, extremes in response variable variability, numbers of factors and levels, system complexity, and a myriad of other design- and system-specific criteria. This discussion will describe considerations (correlation/aliasing, operational significance, thresholds, etc.) and relationships (design, difference to detect, noise, etc.) associated with power. The contribution of power to design selection or adequacy must often be tempered when significant uncertainty or test resources constraints exist. In these situations, other measures of merit and alternative analytical approaches become at least as important as power in the development of designs that achieve the desired technical adequacy. In conclusion, one must understand what power is, what factors influence the calculation, and when to leverage alternative measures of merit. |
Mr. Vance Oas AFOTEC |
Breakout | materials | 2017 | ||||||||
Sample Size and Considerations for Statistical Power (Abstract)
Sample size drives the resources and supports the conclusions of operational test. Power analysis is a common statistical methodology used in planning efforts to justify the number of samples. Power analysis is sensitive to extreme performance (e.g. 0.1% correct responses or 99.999% correct responses) relative to a threshold value, extremes in response variable variability, numbers of factors and levels, system complexity, and a myriad of other design- and system-specific criteria. This discussion will describe considerations (correlation/aliasing, operational significance, thresholds, etc.) and relationships (design, difference to detect, noise, etc.) associated with power. The contribution of power to design selection or adequacy must often be tempered when significant uncertainty or test resources constraints exist. In these situations, other measures of merit and alternative analytical approaches become at least as important as power in the development of designs that achieve the desired technical adequacy. In conclusion, one must understand what power is, what factors influence the calculation, and when to leverage alternative measures of merit. |
Mr. Nick Garcia AFOTEC |
Breakout | materials | 2017 | ||||||||
Experimental Design for Composite Pressure Vessel Life Prediction (Abstract)
One of the major pillars of experimental design is sequential learning. The experimental design should not be viewed as a “one-shot” effort, but rather as a series of experiments where each stage builds upon information learned from the previous study. It is within this realm of sequential learning that experimentation soundly supports the application of the scientific method. This presentation illustrates the value of sequential experimentation and also the connection between the scientific method and experimentation through a discussion of a multi-stage project supported by NASA’s Engineering Safety Center (NESC) where the objective was to assess the safety of composite overwrapped pressure vessels (COPVs). The analytical team was tasked with devising a test plan to model stress rupture failure risk in carbon fiber strands that encase the COPVs with the goal of understanding the reliability of the strands at use conditions for the expected mission life. This presentation highlights the recommended experimental design for the strand tests and then discusses the benefits that resulted from the suggested sequential testing protocol. |
Dr. Anne Drsicoll Virginia Tech |
Breakout | materials | 2017 | ||||||||
Improving Sensitivity Experiments (Abstract)
This presentation will provide a brief overview of sensitivity testing, and emphasize applications to several products and system of importance to the Defense as well as private industry, including Insensitive Energetics, Ballistic testing of protective armor, testing of munition fuzes and Microelectromechanical Systems (MEMS) components, and safety testing of high-pressure test ammunition, and packaging for high-value materials. |
Mr. Douglas Ray US Army RDECOM ARDEC |
Breakout | materials | 2017 | ||||||||
Improving Sensitivity Experiments (Abstract)
This presentation will provide a brief overview of sensitivity testing, and emphasize applications to several products and system of importance to the Defense as well as private industry, including Insensitive Energetics, Ballistic testing of protective armor, testing of munition fuzes and Microelectromechanical Systems (MEMS) components, and safety testing of high-pressure test ammunition, and packaging for high-value materials. |
Mr. Kevin Singer US Army |
Breakout | materials | 2017 | ||||||||
Sequential Experimentation for a Binary Response – The Break Separation Method (Abstract)
Binary response experiments are common in epidemiology, biostatistics as well as in military applications. The Up and Down method, Langlie’s Method, Neyer’s method, K in a Row method and 3 Phase Optimal Design are methods used for sequential experimental design when there is a single continuous variable and a binary response. During this talk, we will discuss a new sequential experimental design approach called the Break Separation Method (BSM). BSM provides an algorithm for determining sequential experimental trials that will be used to find a median quantile and fit a logistic regression model using Maximum Likelihood estimation. BSM results in a small sample size and is designed to efficiently compute the median quantile. |
Mr. Darsh Thakkar RIT-S |
Breakout | materials | 2017 | ||||||||
Sequential Experimentation for a Binary Response – The Break Separation Method (Abstract)
Binary response experiments are common in epidemiology, biostatistics as well as in military applications. The Up and Down method, Langlie’s Method, Neyer’s method, K in a Row method and 3 Phase Optimal Design are methods used for sequential experimental design when there is a single continuous variable and a binary response. During this talk, we will discuss a new sequential experimental design approach called the Break Separation Method (BSM). BSM provides an algorithm for determining sequential experimental trials that will be used to find a median quantile and fit a logistic regression model using Maximum Likelihood estimation. BSM results in a small sample size and is designed to efficiently compute the median quantile. |
Dr. Rachel Silvestrini RIT-S |
Breakout | materials | 2017 | ||||||||
Carrier Reliability Model Validation (Abstract)
Model Validation for Simulations of CVN-78 Sortie Generation As part of the test planning process, IDA is examining flight operations on the Navy’s newest carrier, CVN-78. The analysis uses a model, the IDA Virtual Carrier Model (IVCM), to examine sortie generation rates and whether aircraft can complete missions on time. Before using IVCM, it must be validated. However, CVN-78 has not been delivered to the Navy, and data from actual operations are to validate the model. Consequently, we will validate IVCM by comparing it to another model. This is a reasonable approach when a model is used in general analyses such as test planning, but is not acceptable when a model is used in the assessment of system effectiveness and suitability. The presentation examines the use of various statistical tools – Wilcoxon Rank Sum Test, Kolmogorov-Smirnov Test, and lognormal regression – to examine whether the results from two models provide similar results and to quantify the magnitude of any differences. From the analysis, IDA concluded that locations and distribution shapes are consistent, and that the differences between the models are less than 15 percent, which is acceptable for test planning. |
Dr. Dean Thomas IDA |
Breakout | 2017 | |||||||||
Model Based Systems Engineering Panel Discussion (Abstract)
This panel will share status, experiences and expectations within DoD and NASA for transitioning Systems Engineering to a more integrated digital engineering domain. A wide range of perspectives will be provided, covering the implementation waterfront of practitioner, management, research and strategy. Panelist will also be prepared to discuss more focused areas of digital systems engineering, such as test and evaluation, and engineering statistics. |
Mr. John Holladay NASA |
Breakout | materials | 2017 | ||||||||
Resampling Methods (Abstract)
Resampling Methods: This tutorial presents widely used resampling methods to include bootstrapping, cross-validation, and permutation tests. Underlying theories will be presented briefly, but the primary focus will be on applications. A new graph-theoretic approach to change detection will be discussed as a specific application of permutation testing. Examples will be demonstrated in R; participants are encouraged to bring their own portable computers to follow along using datasets provided by the instructor. |
Dr. David Ruth United States Naval Academy |
Breakout | materials | 2017 | ||||||||
Structured Decision Making (Abstract)
Difficult choices are often required in a decision-making process where resources and budgets are increasingly constrained. This talk demonstrates a structured decision-making approach using layered Pareto fronts to prioritize the allocation of funds between munitions stockpiles based on their estimated reliability, the urgency of needing available units, and the consequences if adequate numbers of units are . This case study illustrates the process of first identifying appropriate metrics that summarize important dimensions of the decision, and then eliminating non-contenders from further consideration in an objective stage. The final subjective stage incorporates subject matter expert priorities to select the four stockpiles to receive additional maintenance and surveillance funds based on understanding the trade-offs and robustness to various user priorities. |
Dr. Christine Anderson Cook LANL |
Breakout | materials | 2017 | ||||||||
Do Asymmetries in Nuclear Arsenals Matter? (Abstract)
The importance of the nuclear balance vis-a-vis our principal adversary has been the subject of intense but unresolved debate in the international security community for almost seven decades. Perspectives on this question underlie national security policies regarding potential unilateral reductions in strategic nuclear forces, the imbalance of nonstrategic nuclear weapons in Europe, nuclear crisis management, nuclear proliferation, and nuclear doctrine. The overwhelming majority of past studies of the role of the nuclear balance in nuclear crisis evolution and outcome have been qualitative and focused on the relative importance of the nuclear balance and national resolve. Some recent analyses have invoked statistical methods, however, these quantitative studies have generated intense controversy because of concerns with analytic rigor. We apply a multi-disciplinary approach that combines historical case study, international relations theory, and appropriate statistical analysis. This approach results in defensible findings on causal mechanisms that regulate nuclear crisis resolution. Such findings should inform national security policy choices facing the Trump administration. |
Dr. Jane Pinelis Johns Hopkins University Applied Physics Lab or JHU |
Breakout | materials | 2017 | ||||||||
Communication in Statistics & the Five Hardest Concepts |
Dr. Jennifer Van-Mellekom Virginia Tech |
Breakout | 2017 | |||||||||
Uncertainty Quantification: What is it and Why it is Important to Test, Evaluation, and Modeling and Simulation in Defense and Aerospace (Abstract)
Uncertainty appears in many aspects of systems design including stochastic design parameters, simulation inputs, and forcing functions. Uncertainty Quantification (UQ) has emerged as the science of quantitative characterization and reduction of uncertainties in both simulation and test results. UQ is a multidisciplinary field with a broad base of methods including sensitivity analysis, statistical calibration, uncertainty propagation, and inverse analysis. Because of their ability to bring greater degrees of confidence to decisions, uncertainty quantification methods are playing a greater role in test, evaluation, and modeling and simulation in defense and aerospace. The value of UQ comes with better understanding of risk from assessing the uncertainty in test and modeling and simulation results. The presentation will provide an overview of UQ and then discuss the use of some advanced statistical methods, including DOEs and emulation for multiple simulation solvers and statistical calibration, for efficiently quantifying uncertainties. These statistical methods effectively link test, evaluation and modeling and simulation by coordinating the valuation of uncertainties, simplifying verification and validation activities. |
Dr. Peter Qian University of Wisconsin and SmartUQ |
Breakout | materials | 2017 | ||||||||
Model Uncertainty and its Inclusion in Testing Results (Abstract)
Answers to real world questions are often based on the use of judiciously chosen mathematical/statistical/physical models. In particular, assessment of failure probabilities of physical systems rely heavily on such models. Since no model describes the real world exactly, sensitivity analyses are conducted to examine influences of (small) perturbations of an assumed model. In this talk we present a structured approach, using an “Assumptions Lattice” and corresponding “Uncertainty Pyramid”, for transparently conveying the influence of various assumptions on analysis conclusions. We illustrate this process in the context of a simple multicomponent system. |
Dr. Stever Lund NIST |
Breakout | materials | 2017 | ||||||||
VV&UQ – Uncertainty Quantification for Model-Based Engineering of DoD Systems (Abstract)
The US Army ARDEC has recently established an initiative to integrate statistical and probabilistic techniques into engineering modeling and simulation (M&S) analytics typically used early in the design lifecycle to guide technology development. DOE-driven Uncertainty Quantification techniques, including statistically rigorous model verification and validation (V&V) approaches, enable engineering teams to identify, quantify, and account for sources of variation and uncertainties in design parameters, and identify opportunities to make technologies more robust, reliable, and resilient earlier in the product’s lifecycle. Several recent armament engineering case studies – each with unique considerations and challenges – will be discussed. |
Mr. Douglas Ray US Army RDECOM ARDEC |
Breakout | materials | 2017 | ||||||||
VV&UQ – Uncertainty Quantification for Model-Based Engineering of DoD Systems (Abstract)
The US Army ARDEC has recently established an initiative to integrate statistical and probabilistic techniques into engineering modeling and simulation (M&S) analytics typically used early in the design lifecycle to guide technology development. DOE-driven Uncertainty Quantification techniques, including statistically rigorous model verification and validation (V&V) approaches, enable engineering teams to identify, quantify, and account for sources of variation and uncertainties in design parameters, and identify opportunities to make technologies more robust, reliable, and resilient earlier in the product’s lifecycle. Several recent armament engineering case studies – each with unique considerations and challenges – will be discussed. |
Ms. Melissa Jablonski US Army |
Breakout | materials | 2017 | ||||||||
Data Visualization (Abstract)
Teams of people with many different talents and skills work together at NASA to improve our understanding of our planet Earth, our Sun and solar system, and the Universe. The Earth System is made up of complex interactions and dependencies of the solar, oceanic, terrestrial, atmospheric, and living components. Solar storms have been recognized as a cause of technological problems on Earth since the invention of the telegraph in the 19th century. Solar flares, coronal holes, and coronal mass ejections (CME’s) can emit large bursts of radiation, high speed electrons and protons, and other highly energetic particles that are released from the sun, and are sometimes directed at Earth. These particles and radiation can damage satellites in space, shutdown power grids on earth, cause GPS outages, and have serious health concerns to humans flying at high altitudes on earth, as well as astronauts in space. NASA builds and operates a fleet of satellites to study the sun and a fleet of satellites and aircraft to observe the Earth system. NASA’s Computer Models combine the observations with numerical models, to understand how these systems work. Using satellite observations alongside computer models we can combine many pieces of information to form a coherent view of Earth and the Sun. NASA research helps us understand how processes combine to affect life on Earth: this includes severe weather, health, changes in climate, and space weather. The Scientific Visualization Studio wants you to learn about NASA programs through visualization. The SVS works closely with scientists in the creation of data visualizations, animations, and images in order to promote a greater understanding of Earth and Space Science research activities at NASA and within the academic research community supported by NASA. |
Ms. Lori Perklins NASA |
Breakout | 2017 | |||||||||
Big Data, Big Think (Abstract)
The NASA Big Data, Big Think team jump-starts coordination, strategy, and progress for NASA applications of Big Data Analytics techniques, fosters collaboration and teamwork among centers and improves agency-wide understanding of Big Data research techniques & technologies and their application to NASA mission domains. The effort brings the Agency’s Big Data community together and helps define near term projects and leverages expertise throughout the agency. This presentation will share examples of Big Data activities from the Agency and discuss knowledge areas and experiences, including data management, data analytics and visualization. |
Mr. Robert Beil NASA |
Breakout | materials | 2017 | ||||||||
Project Data Flow Is an Engineered System (Abstract)
Data within a project, investigation or test series are often seen as a bunch of numbers that were produced. While this is part of the story, it forgets the most important part: the data’s users. This more powerful process begins with early focus on planning, executing and managing data flow within a test or project as a system, treating each handoff between internal and external stakeholders a system interface. This presentation will persuade you why data production should be replaced by the idea of a data supply chain focused on goals and customers. The presenter will outline how this could be achieved in your team. The talk is aimed at not only project and data managers, but also team members who produce or use data. Retooling team thinking and processes along these lines will help communication, facilitate availability, display and understanding of data by any stakeholder, make data verification, validation and analysis easier, and help keep team members focused on what is necessary and important: solving the problem at hand. |
Mr. Ken Johnson NASA |
Breakout | materials | 2017 | ||||||||
Comparing Experimental Designs (Abstract)
This tutorial will show how to compare and choose experimental designs based on multiple criteria. Answers to questions like “Which Design of Experiments (DOE) is better/best?” will be answered by looking at both data and graphics that show the relative performance of the designs based on multiple criteria, including; power of the designs for different model terms, how well the designs minimize predictive variance across the design space, to what level are model terms confounded or correlated, what are the relative efficiencies that measure how well coefficients are estimated or how well predictive variance is minimized. Many different case studies of screening, response surface, and screening augmented to response surface designs will be compared. Designs with both continuous and categorical factors, and with constraints on the experimental region will also be compared. |
Dr. Tom Donnelly JMP |
Breakout | materials | 2017 | ||||||||
Testing Autonomous Systems (Abstract)
Autonomous robotic systems (hereafter referred to simply as autonomous systems) have attracted interest in recent years as capabilities improve to operate in unstructured, dynamic environments without continuous human guidance. Acquisition of autonomous systems potentially decrease personnel costs and provide a capability to operate in dirty, dull, or dangerous mission segments or achieve greater operational performance. Autonomy enables a particular action of a system to be automatic or, within programmed boundaries, self-governing. For our purposes, autonomy is defined as the system having a set of intelligence-based capabilities (i.e., learned behaviors) that allows it to respond to situations that were not pre-programmed or anticipated (i.e. learning-based responses) prior to system deployment. Autonomous systems have a degree of self-governance and self-directed behavior, possibly with a human’s proxy for decisions. Because of these intelligence-based capabilities, autonomous systems pose new challenges in conducting test and evaluation that assures adequate performance, safety, and cybersecurity outcomes. We propose an autonomous systems architecture concept and map the elements of a decision theoretic view of a generic decision problem to the components of this architecture. These models offer a foundation for developing a decision-based, common framework for autonomous systems. We also identify some of the various challenges faced by the Department of Defense (DoD) test and evaluation community in assuring the behavior of autonomous systems as well as test and evaluation requirements, processes, and methods needed to address these challenges. |
Dr. Darryl Ahner Director AFIT |
Breakout | materials | 2018 | ||||||||
Screening Experiments with Partial Replication (Abstract)
Small screening designs are frequently used in the initial stages of experimentation with the goal of identifying important main effects as well as to gain insight on potentially important two-factor interactions. Commonly utilized experimental designs for screening (e.g., resolution III or IV two-level fractional factorials, Plackett-Burman designs, etc.) are unreplicated and as such, provide no unbiased estimate of experimental error. However, if statistical inference is considered an integral part of the experimental analysis, one view is that inferential procedures should be performed using the unbiased pure error estimate. As full replication of an experiment may be quite costly, partial replication offers an alternative for obtaining a model independent error estimate. Gilmour and Trinca (2012, Applied Statistics) introduce criteria for the design of optimal experiments for statistical inference (providing for the optimal selection of replicated design points). We begin with an extension of their work by proposing a Bayesian criterion for the construction of partially replicated screening designs with less dependence on an assumed model. We then consider the use of the proposed criterion within the context of multi-criteria design selection where estimation and protection against model misspecification are considered. Insights for analysis and model selection in light of partial replication will be provided. |
Mr. David Edwards Virginia Commonwealth University |
Breakout | materials | 2018 | ||||||||
What is Bayesian Experimental Design? (Abstract)
In an experiment with a single factor with three levels, treatments A, B, and C, a single treatment is to be applied to each of several experimental units selected from some set of units. The response variable is continuous, and differences in its value show the relative effectiveness of the treatments. An experimental design will dictate which treatment is applied to what units. Since differences in the response variable are used to judge differences between treatments, the most important goal of the design is to prevent the treatment effect being masked by some unrelated property of the experimental units. Second important function of the design is to ensure power, that is, that if the treatments are not equally effective, the differences in the response variable are likely to be larger than background noise. Classical experimental design theory uses three principles: replication, randomization, and blocking, to produce an experimental design. Replication refers to how many units are used, blocking is a possible grouping of the units to reduce between unit heterogeneity, and randomization governs the assignment of units to treatment. Classical experimental designs are balanced as much as possible, that is, the three treatments are applied the same number of times, in each potential block of units. Bayesian experimental design aims to make use of additional related information, often called prior information, to produce a design. The information may be in the form of related experimental results, for example, treatments A and B may have been previously studied. It could be additional information about the experimental units, or about the response variable. This additional information could be used to change the usual blocking, to reduce the number of units assigned to treatments A and B compared to C, and/or reduce the total number of units needed to ensure power. This talk aims to explain Bayesian design concepts and illustrate them on realistic examples. |
Blaza Toman Statistical Engineering Division, NIST |
Breakout | materials | 2018 | ||||||||
DOE and Test Automation for System of Systems TE (Abstract)
Rigorous, efficient and effective test science techniques are individually taking hold in many software centric DoD acquisition programs, both in developmental and operational test regimes. These techniques include agile software development, cybersecurity test and evaluation (T&E), design and analysis of experiments and automated software testing. Many software centric programs must also be tested together with other systems to demonstrate they can be successfully integrated into a more complex systems of systems. This presentation focuses on the two test science disciplines of designed experiments (DOE) and automated software testing (AST) and describes how they can be used effectively and leverage one another in planning for and executing a system of systems test strategy. We use the Navy’s Distributed Common Ground System as an example. |
Jim Simpson JK Analytics |
Breakout | materials | 2018 | ||||||||
Building A Universal Helicopter Noise Model Using Machine Learning (Abstract)
Helicopters serve a number of useful roles within the community; however, community acceptance of helicopter operations is often limited by the resulting noise. Because the noise characteristics of helicopters depend strongly on the operating condition of the vehicle, effective noise abatement procedures can be developed for a particular helicopter type, but only when the noisy regions of the operating envelope are identified. NASA Langley Research Center—often in collaboration with other US Government agencies, industry, and academia—has conducted noise measurements for a wide variety of helicopter types, from light commercial helicopters to heavy military utility helicopters. While this database is expansive, it covers only a fraction of helicopter types in current commercial and military service and was measured under a limited set of ambient conditions and vehicle configurations. This talk will describe a new “universal” helicopter noise model suitable for planning helicopter noise abatement procedures. Modern machine learning techniques will be combined with the principle of nondimensionalization and applied to NASA’s helicopter noise data in order to develop a model capable of estimating the noisy operating states of any conventional helicopter under any specific ambient conditions and vehicle configurations. |
Eric Greenwood Aeroacoustics Branch |
Breakout | materials | 2018 | ||||||||
Machine Learning to Assess Piolts’ Cognitive State (Abstract)
The goal of the Crew State Monitoring (CSM) project is to use machine learning models trained with physiological data to predict unsafe cognitive states in pilots such as Channelized Attention (CA) and Startle/Surprise (SS). These models will be used in a real-time system that predicts a pilot’s mental state every second, a tool that can be used to help pilots recognize and recover from these mental states. Pilots wore different sensors that collected physiological data such as a 20-channel electroencephalography (EEG), respiration, and galvanic skin response (GSR). Pilots performed non-flight benchmark tasks designed to induce these states, and a flight simulation with “surprising” or “channelizing” events. The team created a pipeline to generate pilot-dependent models that trains on benchmark data, tune on a portion of a flight task, and be deployed onto the remaining flight task. The model is a series of anomaly-detection based ensembles, where each ensemble focuses on predicting a single state. Ensembles were comprised of several anomaly detectors such as One Class SVMs, each focusing on a different subset of sensor data. We will discuss the performance of these models, as well as the ongoing research generalizing models across pilots and improving accuracy. |
Tina Heinich Computer Engineer, OCIO Data Science Team AST, Data Systems |
Breakout | materials | 2018 | ||||||||
CYBER Penetration Testing and Statistical Analysis in DT&E (Abstract)
Reconnaissance, footprinting, and enumeration are critical steps in the CYBER penetration testing process because if these steps are not fully and extensively executed, the information available for finding a system’s vulnerabilities may be limited. During the CYBER testing process, penetration testers often find themselves doing the same initial enumeration scans over and over for each system under test. Because of this, automated scripts have been developed that take these mundane and repetitive manual steps and perform them automatically with little user input. Once automation is present in the penetration testing process, Scientific Test and Analysis Techniques (STAT) can be incorporated. By combining automation and STAT in the CYBER penetration testing process, Mr. Tim McLean at Marine Corps Tactical Systems Support Activity (MCTSSA) coined a new term called CYBERSTAT. CYBERSTAT is applying scientific test and analysis techniques to offensive CYBER penetration testing tools with an important realization that CYBERSTAT assumes the system under test is the offensive penetration test tool itself. By applying combinatorial testing techniques to the CYBER tool, the CYBER tool’s scope is expanded beyond “one at a time” uses as the combinations of the CYBER tool’s capabilities and options are explored and executed as test cases against the target system. In CYBERSTAT, the additional test cases produced by STAT can be run automatically using scripts. This talk will show how MCTSSA is preparing to use CYBERSTAT in the Developmental Test and Evaluation process of USMC Command and Control systems. |
Timothy McLean | Breakout | materials | 2018 | ||||||||
Method for Evaluating the Quality of Cybersecurity Defenses (Abstract)
This presentation discusses a methodology to use knowledge of cyber attacks and defender responses from operational assessments to gain insights into the defensive posture and to inform a strategy for improvement. The concept is to use the attack thread as the instrument to probe and measure the detection capability of the cyber defenses. The data enable a logistic regression approach to provide a quantitative basis for the analysis and recommendations. |
Dr, Shawn Whetstone Research Staff Member IDA |
Breakout | materials | 2018 | ||||||||
Metrics to Characterize Temporal patterns in Lifespans of Artifacts (Abstract)
Over the past decade, uncertainty quantification has become an integral part of engineering design and analysis. Both NASA and the DoD are making significant investments to advance the science of uncertainty quantification, increase the knowledge base, and strategically expanding its use. This increased use of uncertainty based results improves investment strategies and decision making. However, in complex systems, many challenges still exist when dealing with uncertainty in cases that have sparse, unreliable, poorly understood, and/or conflicting data. Often times, assumptions are made regarding the statistical nature of data that may not be well grounded and the impact of those assumptions is not well understood, which can lead to ill-informed decision making. This talk will focus on the quantification of uncertainty when both well characterized, aleatory, and not well known, epistemic, uncertainty sources exist. Particular focus is given to the treatment and management of epistemic uncertainty. A summary of non-probabilistic methods will be presented along with the propagation of mixed uncertainty and optimization under uncertainty. A discussion of decision making under uncertainty is also included to illustrate the use of uncertainty quantification. |
Soumyo MoitraNA Software Engineering Institute Carnegie Mellon Univeristy |
Breakout | materials | 2018 | ||||||||
Uncertainty Quantification with Mixed Uncertainty Sources (Abstract)
Over the past decade, uncertainty quantification has become an integral part of engineering design and analysis. Both NASA and the DoD are making significant investments to advance the science of uncertainty quantification, increase the knowledge base, and strategically expanding its use. This increased use of uncertainty based results improves investment strategies and decision making. However, in complex systems, many challenges still exist when dealing with uncertainty in cases that have sparse, unreliable, poorly understood, and/or conflicting data. Often times, assumptions are made regarding the statistical nature of data that may not be well grounded and the impact of those assumptions is not well understood, which can lead to ill-informed decision making. This talk will focus on the quantification of uncertainty when both well characterized, aleatory, and not well known, epistemic, uncertainty sources exist. Particular focus is given to the treatment and management of epistemic uncertainty. A summary of non-probabilistic methods will be presented along with the propagation of mixed uncertainty and optimization under uncertainty. A discussion of decision making under uncertainty is also included to illustrate the use of uncertainty quantification. |
Tom West | Breakout | 2018 | |||||||||
System Level Uncertainty Quanification for Low-Boom Supersonic Flight Vehicles (Abstract)
Under current FAA regulations, civilian aircraft may not operate at supersonic speeds over land. However, over the past few decades, there have been renewed efforts to invest in technologies to mitigate sonic boom from supersonic aircraft through advances in both vehicle design and sonic boom prediction. NASA has heavily invested in tools and technologies to enable commercial supersonic flight and currently has several technical challenges related to sonic boom reduction. One specific technical challenge relates to the development of tools and methods to predict, under uncertainty, the noise on the ground generated by an aircraft flying at supersonic speeds. In attempting to predict ground noise, many factors from multiple disciplines must be considered. Further, classification and treatment of uncertainties in coupled systems, mutlifidelity simulations, experimental data, and community responses are all concerns in system level analysis of sonic boom prediction. This presentation will introduce the various methodologies and techniques utilized for uncertainty quantification with a focus on the build up to system level analysis. An overview of recent research activities and case studies investigating the impact of various disciplines and factors on variance in ground noise will be discussed. |
Ben Phillips | Breakout | 2018 | |||||||||
Uncertainty Quantification and Analysis at The Boeing Company (Abstract)
The Boeing Company is assessing uncertainty quantification methodologies across many phases of aircraft design in order to establish confidence in computational fluid dynamics-based simulations of aircraft performance. This presentation provides an overview of several of these efforts. First, the uncertainty in aerodynamic performance metrics of a commercial aircraft at transonic cruise due to turbulence model and flight condition variability is assessed using 3D CFD with non-intrusive polynomial chaos and second order probability. Second, a sample computation of uncertainty in increments is performed for an engineering trade study, leading to the development of a new method for propagating input-uncontrolled uncertainties as well as input-controlled uncertainties. This type of consideration is necessary to account for variability associated with grid convergence on different configurations, for example. Finally, progress toward applying the computed uncertainties in forces and moments into an aerodynamic database used for flight simulation will be discussed. This approach uses a combination of Gaussian processes and multiple-fidelity Kriging meta-modeling to synthesize the required data. |
John Schaefer Sandia National Labortories |
Breakout | materials | 2018 | ||||||||
Interface design for analysts in a data and analysis-rich environment (Abstract)
Increasingly humans will rely on the outputs of our computational partners to make sense of the complex systems in our world. To be employed, the statistical and algorithmic analysis tools that are deployed to analysts’ toolboxes must afford their proper use and interpretation. Interface design for these tool users should provide decision support appropriate for the current stage of sensemaking. Understanding how users build, test, and elaborate their mental models of complex systems can guide the development of robust interfaces. |
Karin Butler Sandia National Labortories |
Breakout | materials | 2018 | ||||||||
Asparagus is the most articulate vegetable ever (Abstract)
During the summer of 2001, Microsoft launched Windows XP, which was lauded by many users as the most reliable and usable operating system at the time. Miami Herald columnist, Dave Berry, responded to this praise by stating that “this is like saying asparagus is the most articulate vegetable ever.” Whether you agree or disagree with Dave Berry, these users’ reactions are relative (to other operating systems and to other past and future versions). This is due to an array of technological factors that have facilitated human-system improvements. Automation is often cited as improving human-system performance across many domains. It is true that when the human and automation are aligned, performance improves. But, what about the times that this is not the case? This presentation will describe the myths and facts about human-system performance and increasing levels of automation through examples of human-system R&D conducted on a satellite ground system. Factors that affect human-system performance and a method to characterize mission performance as it relates to increasing levels of automation will also be discussed. |
Kerstan Cole | Breakout | 2018 | |||||||||
Mitigating Pilot Disorientation with Synthetic Vision Displays (Abstract)
Loss of control in flight has been a leading cause of accidents and incidents in commercial aviation worldwide. The Commercial Aviation Safety Team (CAST) requested studies on virtual day-visual meteorological conditions displays, such as synthetic vision, in order to combat loss of control. Over the last four years NASA has conducted a series of experiments evaluating the efficacy of synthetic vision displays for increased spatial awareness. Commercial pilots with various levels of experience from both domestic and international airlines were used as subjects. This presentation describes the synthetic vision research and how pilot subjects affected experiment design and statistical analyses. |
Kathryn Ballard NASA |
Breakout | materials | 2018 | ||||||||
Sage III Seu Statistical Analysis Model (Abstract)
The Stratospheric Aerosol and Gas Experiment (SAGE III) aboard the International Space Station (ISS) was experiencing a series of anomalies called Single Event Upsets (SEUs). Booz Allen Hamilton was tasked with conducting a statistical analysis to model the incidence of SEUs in the SAGE III equipment aboard the ISS. The team identified factors correlated with SEU incidences, set up a model to track degradation of Sage III, and showed current and past probabilities as a function of the space environment. The space environment of SAGE III was studied to identify possible causes of SEUs. The analysis revealed variables most correlated with the anomalies, including solar wind strength, solar and geomagnetic field behavior, and location/orientation of the ISS, sun, and moon. The data was gathered from a variety of sources including US government agencies, foreign and domestic academic centers, and state-of-the-art simulation algorithms and programs. Logistic regression was used to analyze SEUs and gain preliminary results. The data was divided into small time intervals to approximate independence and allow logistic regression. Due to the rarity of events the initial model results were based on few SEUs. The team set up a Graphical User Interface (GUI) program to automatically analyze new data as it became available to the SAGE III team. A GUI was built to allow the addition of more data over the life of the SAGE III mission. As more SEU incidents occur and are entered into the model, its predictive power will grow significantly. The GUI enables the user to easily rerun the regression analysis and visualize its results to inform operational decision making. |
Ray McCollum Booz Allen Hamilton |
Breakout | materials | 2018 | ||||||||
Initial Validation of the Trust of Automated System Test (Abstract)
Automated systems are technologies that actively select data, transform information, make decisions, and control processes. The U.S. military uses automated systems to perform search and rescue and reconnaissance missions, and to assume control of aircraft to avoid ground collision. Facilitating appropriate trust in automated systems is essential to improving the safety and performance of human-system interactions. In two studies, we developed and validated an instrument to measure trust in automated systems. In study 1, we demonstrated that the scale has a 2-factor structure and demonstrates concurrent validity. We replicated these results using an independent sample in study 2. |
Dr. Heather Wojton Research Staff Member IDA |
Breakout | materials | 2018 | ||||||||
Operational Evaluation of a Flight-deck Software Application (Abstract)
Traffic Aware Strategic Aircrew Requests (TASAR) is a NASA-developed operational concept for flight efficiency and route optimization for the near-term airline flight deck. TASAR provides the aircrew with a cockpit automation tool that leverages a growing number of information sources on the flight deck to make fuel- and time-saving route optimization recommendations while in route. In partnership with a commercial airline, a research prototype software that implements TASAR has been installed on three aircraft to enable the evaluation of this software in operational use. During the flight trials, data are being collected to quantify operational performance, which will enable NASA to improve algorithms and enhance functionality in the software based on real-world user experience. This presentation highlights statistical challenges and discusses lessons learned during the initial stages of the operational evaluation. |
Sara Wilson NASA |
Breakout | materials | 2018 | ||||||||
Comparing M&S Output to Live Test Data: A Missile System Case Study (Abstract)
In the operational testing of DoD weapons systems, modeling and simulation (M&S) is often used to supplement live test data in order to support a more complete and rigorous evaluation. Before the output of the M&S is included in reports to decision makers, it must first be thoroughly verified and validated to show that it adequately represents the real world for the purposes of the intended use. Part of the validation process should include a statistical comparison of live data to M&S output. This presentation includes an example of one such validation analysis for a tactical missile system. In this case, the goal is to validate a lethality model that predicts the likelihood of destroying a particular enemy target. Using design of experiments, along with basic analysis techniques such as the Kolmogorov-Smirnov test and Poisson regression, we can explore differences between the M&S and live data across multiple operational conditions and quantify the associated uncertainties. |
Dr. Kelly Avery Reasearch Staff member IDA |
Breakout | materials | 2018 | ||||||||
NASA’s Human Exploration Research Analog (HERA): An analog mission for isolation, confinement, and remote conditions in space exploration scenarios (Abstract)
Shelley Cazares served as a crewmember of the 14th mission of NASA’s Human Exploration Research Analog (HERA). In August 2017, Dr. Cazares and her three crewmates were enclosed in an approximately 600-sq. ft. simulated spacecraft for an anticipated 45 days of confined isolation at Johnson Space Center, Houston, TX. In preparation for long-duration missions to Mars in the 2030s and beyond, NASA seeks to understand what types of diets, habitats, and activities can keep astronauts healthy and happy on deep space voyages. To collect this information, NASA is conducting several analog missions simulating the conditions astronauts face in space. HERA is a set of experiments to investigate the effects of isolation, confinement, and remote conditions in space exploration scenarios. Dr. Cazares will discuss the application procedure, the pre-mission training process, the life and times inside the habitat during the mission, and her crew’s emergency evacuation from the habitat due to the risk of rising floodwaters in Hurricane Harvey. |
Shelley Cazares NASA |
Breakout | 2018 | |||||||||
Reliability Fundamentals and Analysus Lessons Learned (Abstract)
Although reliability analysis is a part of Operational Test and Evaluation, it is uncommon for analysts to have a background in reliability theory or experience applying it. This presentation highlights some lessons learned from reliability analysis conducted on several AFOTEC test programs. Topics include issues related to censored data, limitations and alternatives to using the exponential distribution, and failure rate analysis using test data. |
Dan Telford AFOTEC |
Breakout | materials | 2018 | ||||||||
Design and Analysis of Nonlinear Models for the Mars 2020 Rover (Abstract)
The Mars Rover 2020 team commonly faces nonlinear behavior across the test program that is often closely related to the underlying physics. Classical and newer response surface designs do well with quadratic approximations while space filling designs have proven useful for modeling & simulation of complex surfaces. This talk specifically covers fitting nonlinear equations based on engineering functional forms as well as sigmoid and exponential decay curves. We demonstrate best practices on how to design and augment nonlinear designs using the Bayesian D-Optimal Criteria. Several examples, to include drill bit degradation, illustrate the relative ease of implementation with popular software and the utility of these methods. |
Jim Wisnowski | Breakout | materials | 2018 | ||||||||
Cases of Second-Order Split-Plot Designs (Abstract)
The fundamental principles of experiment design are factorization, replication, randomization, and local control of error. In many industries, however, departure from these principles is commonplace. Often in our experiments complete randomization is not feasible because the factor level settings are hard, impractical, or inconvenient to change or the resources available to execute under homogeneous conditions are limited. These restrictions in randomization lead to split-plot experiments. We are also often interested in fitting second-order models leading to second-order split-plot experiments. Although response surface methodology has grown tremendously since 1951, the lack of alternatives for second-order split-plots remains largely unexplored. The literature and textbooks offer limited examples and provide guidelines that often are too general. This deficit of information leaves practitioners ill prepared to face the many roadblocks associated with these types of designs. This presentation provides practical strategies to help practitioners in dealing with second-order split-plot and by extension, split-split-plot experiments, including an innovative approach for the construction of a response surface design referred to as second-order sub-array Cartesian product split-plot design. This new type of design, which is an alternative to other classes of split-plot designs that are currently in use in defense and industrial applications, is economical, has a low prediction variance of the regression coefficients, and low aliasing between model terms. Based on an assessment using well accepted key design evaluation criterion, second-order sub-array Cartesian product split-plot designs perform as well as historical designs that have been considered standards up to this point. |
Dr. Luis Cortes MITRE |
Breakout | materials | 2018 | ||||||||
The Development and Execution of Split-Plot Designs In Navy Operational Test and Evaluation: A Practitioner’s Perspective (Abstract)
Randomization is one of the basic principles of experimental design and the associated statistical methods. In Navy operational testing, complete randomization is often not possible due to scheduling or execution constraints. Given these constraints, operational test designers often utilize split-plot designs to accommodate the hard-to-change nature of various factors of interest. Several case studies will be presented to provide insight into the challenges associated with Navy operational test design and execution. |
Stargel Doane | Breakout | 2018 | |||||||||
B-52 Radar Modernization Test Design Considerations (Abstract)
Inherent system processes, restrictions on collection, or cost may impact the practical execution of an operational test. This study presents the use of blocking and split-plot designs when complete randomization is not feasible in operational test. Specifically, the USAF B-52 Radar Modernization Program test design is used to present tradeoffs of different design choices and the impacts of those choices on cost, operational relevance, and analytical rigor. |
Stuart Corbett AFOTEC |
Breakout | materials | 2018 | ||||||||
B-52 Radar Modernization Test Design Considerations (Abstract)
Inherent system processes, restrictions on collection, or cost may impact the practical execution of an operational test. This study presents the use of blocking and split-plot designs when complete randomization is not feasible in operational test. Specifically, the USAF B-52 Radar Modernization Program test design is used to present tradeoffs of different design choices and the impacts of those choices on cost, operational relevance, and analytical rigor. |
Joseph Maloney AFOTEC |
Breakout | materials | 2018 | ||||||||
Experimental Design of a Unique Force Measurement System Calibration (Abstract)
Aerodynamic databases for space flight vehicles rely on wind-tunnel tests utilizing precision force measurement systems (FMS). Recently, NASA’s Space Launch System (SLS) program has conducted numerous wind-tunnel testing. This presentation will focus on the calibration of a unique booster FMS through the use of design of experiments (DoE) and regression modeling. Utilization of DoE resulted in a sparse, time-efficient, design with results exceeding researcher’s expectations. |
Ken Toro | Breakout | 2018 | |||||||||
Application of Design of Experiments to a Calibration of the National Transonic Facility (Abstract)
Recent work at the National Transonic Facility (NTF) at the NASA Langley Research Center has shown that a substantial reduction in freestream pressure fluctuations can be achieved by positioning the moveable model support walls and plenum re-entry flaps to choke the flow just downstream of the test section. This choked condition reduces the upstream propagation of disturbances from the diffuser into the test section, resulting in improved Mach number control and reduced freestream variability. The choked conditions also affect the Mach number gradient and distribution in the test section, so a calibration experiment was undertaken to quantify the effects of the model support wall and re-entry flap movements on the facility freestream flow using a centerline static pipe. A design of experiments (DOE) approach was used to develop restricted-randomization experiments to determine the effects of total pressure, reference Mach number, model support wall angle, re-entry flap gap height, and test section longitudinal location on the centerline static pressure and local Mach number distributions for a reference Mach number range from 0.7 to 0.9. Tests were conducted using air as the test medium at a total temperature of 120 °F as well as for gaseous nitrogen at cryogenic total temperatures of -50, -150, and -250 °F. The resulting data were used to construct quadratic polynomial regression models for these factors using a Restricted Maximum Likelihood (REML) estimator approach. Independent validation data were acquired at off-design conditions to check the accuracy of the regression models. Additional experiments were designed and executed over the full Mach number range of the facility (0.2 £ Mref £ 1.1) at each of the four total temperature conditions, but with the model support walls and re-entry flaps set to their nominal positions, in order to provide calibration regression models for operational experiments where a choked condition downstream of the test section is either not feasible or not required. This presentation focuses on the design, execution, analysis, and results for the two experiments performed using air at a total temperature of 120 °F. Comparisons are made between the regression model output and validation data, as well as the legacy NTF calibration results, and future work is discussed. |
Matt Rhode NASA |
Breakout | materials | 2018 | ||||||||
Application of Design of Experiments to a Calibration of the National Transonic Facility (Abstract)
Recent work at the National Transonic Facility (NTF) at the NASA Langley Research Center has shown that a substantial reduction in freestream pressure fluctuations can be achieved by positioning the moveable model support walls and plenum re-entry flaps to choke the flow just downstream of the test section. This choked condition reduces the upstream propagation of disturbances from the diffuser into the test section, resulting in improved Mach number control and reduced freestream variability. The choked conditions also affect the Mach number gradient and distribution in the test section, so a calibration experiment was undertaken to quantify the effects of the model support wall and re-entry flap movements on the facility freestream flow using a centerline static pipe. A design of experiments (DOE) approach was used to develop restricted-randomization experiments to determine the effects of total pressure, reference Mach number, model support wall angle, re-entry flap gap height, and test section longitudinal location on the centerline static pressure and local Mach number distributions for a reference Mach number range from 0.7 to 0.9. Tests were conducted using air as the test medium at a total temperature of 120 °F as well as for gaseous nitrogen at cryogenic total temperatures of -50, -150, and -250 °F. The resulting data were used to construct quadratic polynomial regression models for these factors using a Restricted Maximum Likelihood (REML) estimator approach. Independent validation data were acquired at off-design conditions to check the accuracy of the regression models. Additional experiments were designed and executed over the full Mach number range of the facility (0.2 £ Mref £ 1.1) at each of the four total temperature conditions, but with the model support walls and re-entry flaps set to their nominal positions, in order to provide calibration regression models for operational experiments where a choked condition downstream of the test section is either not feasible or not required. This presentation focuses on the design, execution, analysis, and results for the two experiments performed using air at a total temperature of 120 °F. Comparisons are made between the regression model output and validation data, as well as the legacy NTF calibration results, and future work is discussed. |
Matt Bailey Jacobs Technology Inc |
Breakout | materials | 2018 | ||||||||
The Use of DOE vs OFAT in the Calibration of AEDC Wind Tunnels (Abstract)
The use of statistically rigorous methods to support testing at Arnold Engineering Development Complex (AEDC) has been an area of focus in recent years. As part of this effort, the use of Design of Experiments (DOE) has been introduced for calibration of AEDC wind tunnels. Historical calibration efforts used One- Factor-at-a-Time (OFAT) test matrices, with a concentration on conditions of interest to test customers. With the introduction of DOE, the number of test points collected during the calibration decreased, and were not necessary located at historical calibration points. To validate the use of DOE for calibration purposes, the 4-ft Aerodynamic Wind Tunnel 4T was calibrated using both DOE and OFAT methods. The results from the OFAT calibration were compared to model developed from the DOE data points and it was determined that the DOE model sufficiently captured the tunnel behavior within the desired levels of uncertainty. DOE analysis also showed that within Tunnel 4T, systematic errors are insignificant as indicated by agreement noted between the two methods. Based on the results of this calibration, a decision was made to apply DOE methods to future tunnel calibrations, as appropriate. The development of the DOE matrix in Tunnel 4T required the consideration of operational limitations, measurement uncertainties, and differing tunnel behavior over the performance map. Traditional OFAT methods allowed tunnel operators to set conditions efficiently while minimizing time consuming plant configuration changes. DOE methods, however, require the use of randomization which had the potential to add significant operation time to the calibration. Additionally, certain tunnel parameters, such as variable porosity, are only of interest in a specific region of the performance map. In addition to operational concerns, measurement uncertainty was an important consideration for the DOE matrix. At low tunnel total pressures, the uncertainty in the Mach number measurements increase significantly. Aside from introducing non-constant variance into the calibration model, the large uncertainties at low pressures can increase overall uncertainty in the calibration in high pressure regions where the uncertainty would otherwise be lower. At high pressures and transonic Mach numbers, low Mach number uncertainties are required to meet drag count uncertainty requirements. To satisfy both the operational and calibration requirements, the DOE matrix was divided into multiple independent models over the tunnel performance map. Following the Tunnel 4T calibration, AEDC calibrated the Propulsion Wind Tunnel 16T, Hypersonic Wind Tunnels B and C, and the National Full-Scale Aerodynamics Complex (NFAC). DOE techniques were successfully applied to the calibration of Tunnel B and NFAC, while a combination of DOE and OFAT test methods were used in Tunnel 16T because of operational and uncertainty requirements over a portion of the performance map. Tunnel C was calibrated using OFAT because of operational constraints. The cost of calibrating these tunnels has not been significantly reduced through the use of DOE, but the characterization of test condition uncertainties is firmly based in statistical methods. |
Rebecca Rought AEDC/TSTA |
Breakout | materials | 2018 | ||||||||
Initial Investigation into the Psychoacoustic Properties of Small Unmanned Aerial System Noise (Abstract)
For the past several years, researchers at NASA Langley have been engaged in a series of projects to study the degree to which existing facilities and capabilities, originally created for work on full-scale aircraft, are extensible to smaller scales – those of the small unmanned aerial systems (sUAS, also UAVs and, colloquially, `drones’) that have been showing up in the nation’s airspace. This paper follows an effort that has led to an initial human-subject psychoacoustic test regarding the annoyance generated by sUAS noise. This effort spans three phases: 1. the collection of the sounds through field recordings, 2. the formulation and execution of a psychoacoustic test using those recordings, 3. the analysis of the data from that test. The data suggests a lack of parity between the noise of the recorded sUAS and that of a set of road vehicles that were also recorded and included in the test, as measured by a set of contemporary noise metrics. |
Andrew Chrisrian Structural Acoustics Branch |
Breakout | materials | 2018 | ||||||||
Combining Human Factors Data and Models of Human Performance (Abstract)
As systems and missions become increasingly complex, the roles of humans throughout the mission life cycle is evolving. In areas, such as maintenance and repair, hands-on tasks still dominate, however, new technologies have changed many tasks. For example, some critical human tasks have moved from manual control to supervisory control, often of systems at great distances (e.g., remotely piloting a vehicle, or science data collection on Mars). While achieving mission success remains the key human goal, almost all human performance metrics focus on failures rather than successes. This talk will examine the role of humans in creating mission success as well as new approaches for system validation testing needed to keep up with evolving systems and human roles. |
Cynthia Null Technicial Fellow for Human Factors |
Breakout | materials | 2018 | ||||||||
A Multi-method, Triangulation Approach to Operational Testing (Abstract)
Humans are not produced in quality-controlled assembly lines, and we typically are much more variable than the mechanical systems we employ. This mismatch means that when characterizing the effectiveness of a system, the system must be considered in the context of its users. Accurate measurement is critical to this endeavor, yet while human variability is large, effort to reduce measurement error of those humans is relatively small. The following talk discusses the importance of using multiple measurement methods—triangulation—to reduce error and increase confidence when characterizing the quality of HSI. A case study from an operational test of an attack helicopter demonstrates how triangulation enables more actionable recommendations. |
Dr. Daniel Porter Research Staff Member IDA |
Breakout | materials | 2018 | ||||||||
Illustrating the Importance of Uncertainty Quantification (UQ) in Munitions Modeling (Abstract)
The importance of the incorporation of Uncertainty Quantification (UQ) techniques into the design and analysis of Army systems is discussed. Relevant examples are presented where UQ would have been extremely useful. The intent of the presentation is to show the broad relevance of UQ and how, in the future, it will greatly improve the time to fielding and quality of developmental items. |
Donald Calucci | Breakout | 2018 | |||||||||
Space-Filling Designs for Robustness Experiments (Abstract)
To identify the robust settings of the control factors, it is very important to understand how they interact with the noise factors. In this article, we propose space-filling designs for computer experiments that are more capable of accurately estimating the control-by-noise interactions. Moreover, the existing space-filling designs focus on uniformly distributing the points in the design space, which are not suitable for noise factors because they usually follow non-uniform distributions such as normal distribution. This would suggest placing more points in the regions with high probability mass. However, noise factors also tend to have a smooth relationship with the response and therefore, placing more points towards the tails of the distribution is also useful for accurately estimating the relationship. These two opposing effects make the experimental design methodology a challenging problem. We propose optimal and computationally efficient solutions to this problem and demonstrate their advantages using simulated examples and a real industry example involving a manufacturing packing line. |
Roshan Vengazhiyil | Breakout | 2018 | |||||||||
Introduction of Uncertainty Quantification and Industry Challenges (Abstract)
Uncertainty is an inescapable reality that can be found in nearly all types of engineering analyses. It arises from sources like measurement inaccuracies, material properties, boundary and initial conditions, and modeling approximations. For example, the increasing use of numerical simulation models throughout industry promises improved design and insight at significantly lower costs and shorter timeframes than purely physical testing. However, the addition of numerical modeling has also introduced complexity and uncertainty to the process of generating actionable results. It has become not only possible, but vital to include Uncertainty Quantification (UQ) in engineering analysis. The competitive benefits of UQ include reduced development time and cost, improved designs, better understanding of risk, and quantifiable confidence in analysis results and engineering decisions. Unfortunately, there are significant cultural and technical challenges which prevent organizations from utilizing UQ methods and techniques in their engineering practice. This presentation will introduce UQ methodology and discuss the past and present strategies for addressing these challenges, making it possible to use UQ to enhance engineering processes with fewer resources and in more situations. Looking to the future, anticipated challenges will be discussed along with an outline of the path towards making UQ a common practice in engineering. |
Peter Chien | Breakout | 2018 | |||||||||
The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size (Abstract)
Mixed models are ideally suited to analyzing nested data from within-persons designs, designs that are advantageous in applied research. Mixed models have the advantage of enabling modeling of random effects, facilitating an accounting of the intra-person variation captured by multiple observations of the same participants and suggesting further lines of control to the researcher. However, the sampling requirements for mixed models are prohibitive to other areas which could greatly benefit from them. This simulation study examines the impact of small sample sizes in both levels of the model on the fixed effect bias, type I error, and power of a simple mixed model analysis. Despite the need for adjustments to control for type I error inflation, findings indicate that smaller samples than previously recognized can be used for mixed models under certain conditions prevalent in applied research. Examination of the marginal benefit of increases in sample subject and observation size provides applied researchers with guidance for developing mixed-model repeated measures designs that maximize power. |
Dr. Kristina Carter Research Staff Member IDA |
Breakout | materials | 2018 | ||||||||
Evaluating Deterministic Models of Time Series by Comparison to Observations (Abstract)
A standard paradigm for assessing the quality of model simulations is to compare what these models produce to experimental or observational samples of what the models seek to predict. Often these comparisons are based on simple summary statistics, even when the objects of interest are time series. Here, we propose a method of evaluation through probabilities derived from tests of hypotheses that model-simulated and observed time sequences share common signals. The probabilities are based on the behavior of summary statistics of model output and observational data, over ensembles of pseudo-realizations. These are obtained by partitioning the original time sequences into signal and noise components, and using a parametric bootstrap to create pseudo-realizations of the noise. We demonstrate with an example from climate model evaluation for which this methodology was developed. |
Amy Braverman Jet Propulsion Laboratory, California Institute of Technology |
Breakout | materials | 2018 | ||||||||
Challenger Challenge: Pass-Fail Thinking Increases Risk Measurably (Abstract)
Binomial (pass-fail) response metrics are more far more commonly used in test, requirements, quality and engineering than they need to be. In fact, there is even an engineering school of thought that they’re superior to continuous-variable metrics. This is a serious, even dangerous problem in aerospace and other industries: think the Space Shuttle Challenger accident. There are better ways. This talk will cover some examples of methods available to engineers and statisticians in common statistical software. It will not dig far into the mathematics of the methods, but will walk through where each method might be most useful and some of the pitfalls inherent in their use – including potential sources of misinterpretation and suspicion by your teammates and customers. The talk is geared toward engineers, managers and professionals in the –ilities who run into frustrations dealing with pass-fail data and thinking. |
Ken Johnson Applied Statistician NASA Engineering and Safety Center |
Breakout | materials | 2018 | ||||||||
Doppler Assisted Sensor Fusion for Tracking and Exploitation (Abstract)
We have developed a new sensor fusion approach called Doppler Assisted Sensor Fusion (DASF), which pairs a range rate profile from one moving sensor with location accuracy with another range rate profile from another sensor with high location accuracy. This paring provides accurate identification, location, and tracking of moving emitters, with low association latency. The approach we use for data fusion is distinct from previous approaches. In the conventional approach, post detection data from the each sensor is overlaid with data from another sensor in an attempt to associate the data outputs. For the DASF approach the fusion is at the sensor level, the first sensor collects data and provides the standard identification in addition a unique emitter range rate profile. This profile is used to associate the emitter signature to a range-rate signature obtained by the geolocation sensor. The geolocation sensor then provides the desired location accuracy. We will provide results using real tracking data scenarios. |
J. Derek Tucker Sandia National Laboratories |
Breakout | materials | 2018 | ||||||||
XPCA: A Copula-based Generalization of PCA for Ordinal Data (Abstract)
Principal Component Analysis is a standard tool in an analyst’s toolbox. The standard practice of rescaling each column can be reframed as a copula-based decomposition in which the marginal distributions are fit with a univariate Gaussian distribution and the joint distribution is modeled with a Gaussian copula. In this light, we present an alternative to traditional PCA we call XPCA by relaxing the marginal Gaussian assumption and instead fit each marginal distribution with the empirical distribution function. Interval-censoring methods are used to account for the discrete nature of the empirical distribution function when fitting the Gaussian copula model. In this talk, we derive the XPCA estimator and inspect the differences in fits on both simulated and real data applications. |
Cliff Anderson-Bergman Sandia National Laboratories |
Breakout | materials | 2018 | ||||||||
Insights, Predictions, and Actions: Descriptive Definitions of Data Science, Machine Learning, and Artificial Intelligence (Abstract)
The terms “Data Science”, “Machine Learning”, and “Artificial Intelligence” have become increasingly common in popular media, professional publications, and even in the language used by DoD leadership. But these terms are often not well understood, and may be used incorrectly and interchangeably. Even a textbook definition of these fields is unlikely to help with the distinction, as many definitions tend to lump everything under the umbrella of computer science or introduce unnecessary buzzwords. Leveraging a framework first proposed by David Robinson, Chief Data Scientist at DataCamp, we forgo the textbook definitions and instead focus on practical distinctions between the work of practitioners in each field, using examples relevant to the test and evaluation community where applicable. |
Dr. Andrew Flack Research Staff Member IDA |
Breakout | materials | 2018 | ||||||||
Can AI Predict Human Behavior? (Abstract)
Given the rapid increase of novel machine learning applications in cybersecurity and people analytics, there is significant evidence that these tools can give meaningful and actionable insights. Even so, great care must be taken to ensure that automated decision making tools are deployed in such a way as to mitigate bias in predictions and promote security of user data. In this talk, Dr. Burns will take a deep dive into an open source data set in the area of people analytics, demonstrating the application of basic machine learning techniques, while discussing limitations and potential pitfalls in using an algorithm to predict human behavior. In the end, Dustin will draw a comparison between the potential to predict human behavioral propensity to things such as becoming an insider threat to how assisted diagnosis tools are used in medicine to predict development or reoccurrence of illnesses. |
Dustin Burns Senior Scientist Exponent (bio)
Dr. Dustin Burns is a Senior Scientist in the Statistical and Data Sciences practice at Exponent, a multidisciplinary scientific and engineering consulting firm dedicated to responding to the world’s most impactful business problems. Combining his background in laboratory experiments with his expertise in data analytics and machine learning, Dr. Burns works across many industries, including security, consumer electronics, utilities, and health sciences. He supports clients’ goals to modernize data collection and analytics strategies, extract information from unused data such as images and text, and test and validate existing systems. |
Breakout![]() |
![]() | 2020 | ||||||||
A Notional Case Study of Uncertainty Analysis in Live Fire Modeling and Simulation (Abstract)
A vulnerability assessment, which evaluates the ability of an armored combat vehicle and its crew to withstand the damaging effects of an anti-armor weapon, presents a unique challenge because vehicles are expensive and testing is destructive. This limits the number of full-up-system-level tests to quantities that generally do not support meaningful statistical inference. The prevailing solution to this problem is to obtain test data that is more affordable from sources which include component- and subsystem-level testing. This creates a new challenge that forms the premise of this paper: how can lower-level data sources be connected to provide a credible system-level prediction of vehicle vulnerability? This paper presents a case study that demonstrates an approach to this problem that emphasizes the use of fundamental statistical techniques — design of experiments, statistical modeling, and propagation of uncertainty — in the context of a combat scenario that depicts a ground vehicle being engaged by indirect artillery. |
Thomas Johnson Research Staff Member IDA |
Breakout | 2020 | |||||||||
KC-46A Adaptive Relevant Testing Strategies to Enable Incremental Evaluation (Abstract)
The DoD’s challenge to provide capability at the “Speed of Relevance” has generated many new strategies to adapt to rapid development and acquisition. As a result, Operational Test Agencies (OTA) have had to adjust their test processes to accommodate rapid, but incremental delivery of capability to the warfighter. The Air Force Operational Test and Evaluation Center (AFOTEC) developed the Adaptive Relevant Testing (ART) concept to answer the challenge. In this session, AFOTEC Test Analysts will brief examples and lessons learned from implementing the ART principles on the KC-46A acquisition program to identify problems early and promote the delivery of individual capabilities as they are available to test. The AFOTEC goal is to accomplish these incremental tests while maintaining a rigorous statistical evaluation in a relevant and timely manner. This discussion will explain in detail how the KC-46A Initial Operational Test and Evaluation (IOT&E) was accomplished in a unique way that allowed the test team to discover, report on, and correct major system deficiencies much earlier than traditional methods. |
J. Quinn Stank Lead KC-46 Analyst AFOTEC (bio)
First Lieutenant J. Quinn Stank is the Lead Analyst for the Air Force Operational Test and Evaluation Center Detachment 5 at Outside Location Everett, Washington. The lieutenant serves as the advisor to the Operational Test and Evaluation team for the KC-46A. Lieutenant Stank, originally from Knoxville, Tn., received his commission as a second lieutenant upon graduation from the United States Air Force Academy in 2016. EDUCATION:
|
Breakout![]() |
![]() | 2020 | ||||||||
An Approach to Assessing Sterilization Probabilistically (Abstract)
Sterility Assurance Level (SAL) is the probability that a product, after being exposed to a given sterilization process, contains one or more viable organisms. The SAL is a standard way of defining cleanliness requirements on a product or acceptability of a sterilization procedure in industry and regulatory agencies. Since the SAL acknowledges the inherent probabilistic nature of detecting sterility – that we cannot be absolutely sure that sterility is achieved – a probabilistic approach to its assessment is required that considers the actual end-to-end process involved with demonstrating sterility. Provided here is one such approach. We assume the process of demonstrating sterility is based on the scientific method, and therefore starts with a scientific hypothesis of the model that generates the life/death outcomes of the organisms that will be observed in experiment. Experiments are then designed (e.g. environmental conditions determined, reference/indicator organisms selected, number of samples/replicates, instrumentation) and performed, and an initial conclusion regarding the appropriate model is drawn from observed numbers of organisms remaining once exposed to the sterilization process. Ideally, this is then validated by future experiment by independent scientific inquiry, or the results are used to develop new hypotheses and the process repeats. Ultimately, a decision is made regarding the validity of the sterilization process by means of comparing with a SAL. Bayesian statistics naturally lends itself to develop a probability distribution from this process to compare against a given SAL, which is the approach taken in this paper. We provide an example application to provide a simple demonstration of this from actual experiments performed, and discuss its relevance to the future NASA mission, Mars Sample Return. |
Mike Dinicola System Engineer Jet Propulsion Laboratory |
Breakout | 2020 | |||||||||
Reevaluating Planetary Protection Bioburden Accounting and Verification Reporting (Abstract)
Biological cleanliness of launched spacecraft destined to celestial bodies is governed by International Planetary Protection Guidelines. In particular, spacecraft that are targeting bodies which are of scientific interest in understanding the origins of life or thought to harbor extant life must undergo microbial reduction and recontamination prevention regimes throughout the hardware assembly, test and launch operations that result in a direct verification of biological cleanliness of spacecraft surfaces. As a result of this verification associated biologicals are enumerated on petri dishes and then numerically treated to account for sample volumes, sample device efficiency, and laboratory processing efficiencies to arrive at a bioburden density. The current NASA approach utilizes a 1950’s Viking-based mathematical treatment which factors in the raw colony count, sample device and processing efficiency and fraction of extract analyzed. Historically per NASA direction, samples are grouped based upon flight hardware structural and functional proximity and if the value of raw colony counts is zero it is changed to one. In previous missions that launched from 1996 to 2018, a combination of Poisson and Gaussian statistics that evolved from mission to mission were utilized. In 2019, the statistical approach for performing bioburden accounting and verification reporting was re-evaluated to develop a technique that is both mathematically and biologically valid. Multiple mission datasets have been analyzed at high level and it has been determined that since there is a significant data set for each mission with low incidence rates of biological counts, a Bayesian model would be appropriate. Data from the InSight mission was then utilized to demonstrate the application of these models and approach on spacecraft sampled data using both informed and non-informed priors and subsequently compared to the current and historical mission mathematical treatments. The Bayesian models were within family to the previous and heritage approaches, and as an added benefit were able to provide a range and associated confidence intervals for the reported values. From the preliminary work, we propose that these models present a valid mathematical and biological approach for reporting spacecraft bioburden to be utilized in final requirements reporting and as an input into the initial bioburden population used for probabilistic risk assessments. Further development of the models will include a full spacecraft bioburden verification comparison as well as the utilization of ground truth experiments, as deemed necessary. |
James Benardini Sr. Planetary Protection Engineer Jet Propulsion Laboratory |
Breakout | 2020 | |||||||||
Development and Analytic Process Used to Develop a 3-Dimensional Graphical User Interface System for Baggage Screening (Abstract)
The Transportation Security Administration (TSA) uses several types of screening technologies for the purposes of threat detection at airports and federal facilities across the country. Computed Tomography (CT) systems afford TSA personnel in the Checked Baggage setting a quick and effective method to screen property with less need to physically inspect property due to their advanced imaging capabilities. Recent reductions in size, cost, and processing speed for CT systems spurred an interest in incorporating these advanced imaging systems at the Checkpoint to increase the speed and effectiveness of scanning personal property as well as passenger satisfaction during travel. The increase in speed and effectiveness of scanning personal property with fewer physical property inspections stems from several qualities native to CT imaging that current 2D X-Ray based Advanced Technology 2 (AT2) systems typically found at Checkpoints lack. Specifically, the CT offers rotatable 3D images and advanced identification algorithms that allow TSA personnel to more readily identify items requiring review on-screen without requesting that passengers remove them from their bag. The introduction of CT systems at domestic airports led to the identification of a few key Human Factors issues, however. Several vendors used divergent strategies to produce the CT systems introduced at domestic airport Checkpoints. Each system offered users different 3D visualizations, informational displays, and identification algorithms, offering a range of views, tools, layouts, and material colorization for users to sort through. The disparity in system similarity and potential for multiple systems to operate at a single airport resulted in unnecessarily complex training, testing, certification, and operating procedures. In response, a group of human factors engineers (HFEs) was tasked with creating requirements for a single common Graphical User Interface (GUI) for all CT systems that would provide a standard look, feel, and interaction across systems. We will discuss the development and analytic process used to 1.) gain an understanding of the tasks that CT systems must accomplish at the Checkpoint (i.e. focus groups), 2.) identify what tools Transportation Security Officers (TSOs) tend to use and why (i.e. focus groups and rank-ordered surveys), and 3.) determine how changes during iterative testing effects performance (i.e. A/B testing while collecting response time, accuracy, and tool usage). The data collection effort described here resulted in a set of requirements that produced a highly usable CT interface as measured by several valid and reliable objective and subjective measures. Perceptions of the CGUI’s usability (e.g., the System Usability Scale; SUS) were aligned with TSO performance (i.e., Pd, PFA, and Throughput) during use of the CGUI prototype. Iterative testing demonstrated an increase in the SUS score and performance measures for each revision of the requirements used to produce the common CT interface. User perspectives, feedback, and performance data also offered insight toward the determination of necessary future efforts that will increase user acceptance of the redesigned CT interface. Increasing user acceptance offers TSA the opportunity to improve user engagement, reduces errors, and the likelihood that the system will stay in service without a mandate. |
Charles McKee President and CEO Taverene Analytics LLC (bio)
Mr. McKee provides Test and Evaluation, Systems Engineering, Human Factors Engineering, Strategic Planning, Capture Planning, and Proposal Development support to companies supporting the Department of Defense and Department of Homeland Security. Recently served as President of the Board of Directors, International Test and Evaluation Association (ITEA), 2013 – 2015. Security Clearance: Secret, previously cleared for TS SCI. Homeland Security Vetted. TSA Professional Engineering Logistics Support Services (PELSS2) (May 2016 – present) for Global System Technologies (GST) and TSA Operational Test & Evaluation Support Services (OTSS) and Test & Evaluation Support Services (TESS) (Aug 2014 – 2016) for Engility. Provides Acquisition Management, Systems Engineering, Operational Test & Evaluation (OT&E), Human Factors Engineering (HFE), test planning, design of experiments, data collection, data analysis, statistics, and evaluation reporting on Transportation Security Equipment (TSE) systems deployed to Airports and Intermodal facilities. Led the design and development of a Common Graphical User Interface (CGUI) for new Checkpoint Computed Tomography systems. The CGUI design maximized the Probability of Detection, minimized probability of false alarms, while improving throughput time for screening accessible property by Transportation Security Officers (TSO’s) at airports. Division Manager, Alion Science and Technology, 2009-2014. Oversaw program management and technical support for the Test and Evaluation Division. Provided analysis support to multiple clients such as: Army Program Executive Office (PEO) Simulation Training Instrumentation (STRI) STARSHIP program and DISA Test and Evaluation (T&E) Mission Support Services contract. Provided Subject Matter Expertise to all client on program management, test and evaluation, statistical analysis, modeling and simulation, training, human factors engineering / human systems integration, and design of experiments. Operations Manager, SAIC, 2006-2009. Oversaw the program management and technical support for the Test, Evaluation, and Analysis Operation (TEAO). Provided analysis support to multiple clients such as the Director, Operational Test and Evaluation (DOT&E), Joint Test & Evaluation (JT&E), Test Resource Management Center (TRMC), OSD AT&L Systems Engineering, Defense Modeling and Simulation Office (DMSO), Air Force Test and Evaluation (AF/TE), US Joint Forces Command (USJFCOM) Joint Forces Integration and Interoperability Test (JFIIT), and the Air Combat Command (ACC) Red Flag Exercise Support. Provided Subject Matter Expertise to all clients on program management, test and evaluation, statistical analysis, modeling and simulation, training, human factors engineering / human systems integration, and design of experiments. Senior Program Manager, Human Factors Engineer. BDM / TRW / NGC (1989-2000 and 2003-2006). Provided Human Factors Engineering / Manpower Personnel Integration support to the Army Test and Evaluation Command (ATEC) / Army Evaluation Center (AEC), FAA Systems Engineering Integrated Product Team (SEIPT), and TSA Data Collection, Reduction, Analysis, Reporting, and Archiving (DCRARA) Support. Developed Evaluation Plans, Design of Experiments (DOE), requirements analysis, test planning, test execution, data collection, reduction, analysis, statistical analysis, and military assessments of Army programs. Supported HFE / MANPRINT working groups and System MANPRINT Management Plans. Conducted developmental assessments of System Safety, Manpower, Personnel, Training, and Human Systems Interfaces. MS, Industrial Engineering, NCSU, 1989. Major: Human Factors Engineering. Minor: Occupational Safety and Health. Scholarship from the National Institute of Occupational Safety and Health (NIOSH). Master’s Thesis on Cumulative Trauma Disorders in occupations with repetitive motion. |
Breakout![]() |
![]() | 2020 | ||||||||
Quantifying Computational Uncertainty in Supersonic Flow Predictions without Experimental Measurements (Abstract)
With the advancement of computational modeling, there is a push to reduce historically necessary experimental testing requirements for assessing vehicle component and system level performance. As a result, uncertainty quantification is a necessary part of predictive modeling, particularly regarding the modeling approach. In the absence of experimental data, model-form uncertainty may not be easily determined and model calibration may not be possible. Therefore, quantifying the potential variability as a result of model selection is required to accurately quantify performance, robustness, and reliability. This talk will outline a proposed approach to quantifying uncertainty in a variety vehicle applications with focus given particularly to supersonic flow applications. The aim is to first identify key sources of uncertainty in computational modeling, such as spatial and temporal discretization and turbulence modeling. Then, the classification and treatment of uncertainty sources is discussed, along with the potential impact of these uncertainties on performance predictions. Lastly, a description of five upcoming tests in the NASA Langley Unitary Plan Wind Tunnel designed to test predicative capability are briefly described. |
Thomas West NASA Langley |
Breakout | 2020 | |||||||||
Title Coming Soon |
Kedar Phadke Phadke Associates, Inc |
Breakout | 2020 | |||||||||
Adoption Challenges in Artificial Intelligence and Machine Learning for Analytic Work Environments |
Laura McNamara Distinguished Member of Technical Staff Sandia National Laboratories (bio)
Dr. Laura A. McNamara is Distinguished Member of Technical Staff at Sandia National Laboratories. She’s spent her career partnering with computer scientists, software engineers, physicists, human factors experts, organizational psychologists, remote sensing and imagery scientists, and national security analysts in a wide range of settings. She has expertise in user-centered technology design and evaluation, information visualization/visual analytics, and mixed qualitative/quantitative social science research. Most of her projects involve challenges in sensor management, technology usability, and innovation feasibility and adoption. She enjoys working in Agile and Agile-like environments and is a skilled leader of interdisciplinary engineering, scientific, and software teams. She is passionate about ensuring usability, utility, and adaptability of visualization, operational, and analytic software. Dr. McNamara’s current work focuses on operational and analytic workflows in remote sensing environments. She is also an expert on visual cognitive workflows in team environments, focused on the role of user interfaces and analytic technologies to support exploratory data analysis and information creation with large, disparate, unwieldy datasets, from text to remote sensing. Dr. McNamara has longstanding interest in the epistemology and practices of computational modeling and simulation, verification and validation, and uncertainty quantification. She has worked with the National Geospatial-Intelligence Agency, the Missile Defense Agency, the Defense Intelligence Agency, and the nuclear weapons programs at Sandia and Los Alamos National Laboratories to enhance the effective use of modeling and simulation in interdisciplinary R&D projects. |
Breakout![]() |
![]() | 2020 | ||||||||
Building an End-to-End Model of Super Hornet Readiness (Abstract)
Co-authors: Benjamin Ashwell, Edward Beall, and V. Bram Lillard Bottom-up emulations of real sustainment systems that explicitly model spares, personnel, operations, and maintenance are a powerful way to tie funding decisions to their impact on readiness, but they are not widely used. The simulations require extensive data to properly model the complex and variable processes involved in a sustainment system, and the raw data used to populate the simulation are often scattered across multiple organizations. The Navy has encountered challenges in keeping sustaining the desired number of F/A-18 Super Hornets in mission capable states. IDA was asked to build an end-to-end model of the Super Hornet sustainment system using the OPUS/SIMLOX suite of tools to investigate the strategic levers that drive readiness. IDA built an R package (“honeybee”) that aggregates and interprets Navy sustainment data using statistical techniques to create component-level metrics, and a second R package (“stinger”) that uses these metrics to create a high-fidelity representation of the Navy’s operational tempo for OPUS/SIMLOX. |
Benjamin Ashwell Research Staff Member IDA |
Breakout | 2020 | |||||||||
The OTA Perspective on the Challenges of Testing in the Evolving Acquisition Environment and Proposed Mitigations (Abstract)
During the fall of 2019, AFOTEC led a cross-OTA rapid improvement event to identify and mitigate challenges Operational Test (OT) teams from all services are facing in the evolving acquisition environment. Surveys were sent out to all of the service OTAs, AFOTEC compiled the results, and a cross-OTA meeting was held at the Institute for Defense Analysis (IDA) to determine the most significant challenges to address. This presentation discusses the selected challenges and the proposed mitigations that were briefed to the Director, Operational Test and Evaluation (DOT&E) and the Service OTA Commanders at the OTA Roundtable in November 2019. |
Leisha Scheiss AFOTEC |
Breakout | 2020 | |||||||||
Quantifying Uncertainty in Reporting System Usability Scale Results (Abstract)
Much work has been conducted over the last two decades on standardizing usability surveys for determining usability of a new system. In this presentation, we analyze not what we ask users, but rather how we report results from standard usability surveys such as the System Usability Scale (SUS). When the number of individuals surveyed is large, classical statistical techniques can be leveraged; however, as we will demonstrate, due to the skewness in the data these techniques may be suboptimal when the number of individuals surveyed is small. In such small-sample circumstances, we argue for use of the bias-corrected and accelerated bootstrap confidence interval. We further demonstrate how Bayesian inference can be leveraged to take advantage of the over 10 years worth of data that exists for SUS surveys. Finally, we demonstrate an online app that we have built to aid practitioners in quantifying the uncertainty in their SUS surveys. |
Nick Clark Assistant Professor West Point – Math Department |
Breakout | 2020 | |||||||||
Uncertainty Quantification and Check Standard Testing at NASA Glenn Research Facilities (Abstract)
Uncertainty quantification has been performed in various NASA Glenn Research Center (GRC) facilities over the past several years, primarily in the wind tunnel facilities. Uncertainty propagation analysis has received a bulk of the focus and effort put forth to date, and while it provides a vital aspect of the overall uncertainty picture, it must be supplemented by consistent data analysis and check standard programs in order to achieve and maintain a statistical basis for proven data quality. This presentation will briefly highlight the uncertainty propagation effort at NASA GRC, its usefulness, and the questions that remain unanswered. It will show how performing regular check standard testing fills in the gaps in the current UQ effort, will propose high-level test plans in two to three GRC facilities, and discuss considerations that need to be addressed in the planning process. |
Erin Hubbard Data Engineer Jacobs / NASA Glenn Research Center |
Breakout | 2020 | |||||||||
STAT COE Autonomy Test and Evaluation Workshop Highlights (Abstract)
The Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE) and Science of Test Research Consortium Advancements in Test and Evaluation of Autonomous Systems (ATEAS) Workshop, held 29-31 October 2019, at the Wright Brothers Institute, is part of a study being conducted on behalf of the Office of the Secretary of Defense (OSD).The goal of the study is to determine the current state of autonomous systems used within the Department of Defense (DoD), industry, and academia, with a focus on the test, evaluation, verification, and validation of those systems. The workshop addressed two overarching study objectives: 1) identify and develop methods and processes, and identify lessons learned needed to enable rigorous test and evaluation (T&E) of autonomous systems; and 2) refine current challenges and gaps in DoD methods, processes, and test ranges to rigorously test and evaluate autonomous systems. The workshop also introduced the STAT COE’s data call to the DoD, industry, and academia. This data call further informs the autonomy community on the specific efforts of, and challenges faced by, collaborators in T&E of autonomous systems. Finally, the workshop provided STAT COE members with an excellent opportunity to form partnerships with members of the DoD with the intent of finding an autonomous system pilot program in each service branch that could benefit from STAT COE support and provide first-hand experience in T&E of autonomous systems. Major takeaways from the presentations and panels included updated information on the challenges originally identified by the STAT COE in 2015, as well as new challenges related to modeling and simulation and data ownership, storage, and sharing within the DoD. The workshop also yielded information about current efforts for T&E of autonomous systems in DoD, industry, and academia; two completed data calls; targeted population to which to send the data call; and several potential pilot programs which the STAT COE could support. The next steps of the STAT COE will be to distribute the workshop report and data call, and engage with pilot programs to get first-hand experience in T&E of autonomous systems and attempt to find solutions to the challenges currently identified. |
Troy Welker Analyst STAT COE |
Breakout | 2020 | |||||||||
Automated Feature Extraction (Abstract)
Pivotal to current US military operations is the quick identification of buildings on Gridded Reference Graphics (GRGs), gridded satellite images of an objective. At present, the United States Special Operations Command (SOCOM) identifies these buildings by hand through the work of individual intelligence officers. Recent advances in Convolutional Neural Networks (CNNs), however, allow the possibility for this process to be streamlined through the use of object detection algorithms. In this presentation, we describe an object detection algorithm designed to quickly identify and highlight every building present on a GRG. Our work leverages both the U-Net and the Mask R-CNN architectures as well as a four-city dataset to produce an algorithm that accurately identifies a large breadth of buildings. Our model reports an accuracy of 87% and is capable of detecting buildings on a diverse set of test images. |
Samuel Humphries Student United States Military Academy |
Breakout | 2020 | |||||||||
Predicting System Failures – A Statistical Approach to Reliability Growth Modeling (Abstract)
Reliability, in colloquial terms, is the ability of a system or piece of equipment to perform some required function when and where we need it to. We argue that new and unproven military equipment can benefit from a statistical approach for modeling reliability growth. The modern “standard” for these programs is the AMSAA Planning Model based on Projection Methodology (PM2). We describe how to augment PM2 with a statistical perspective to make reliability prediction more “data informed.” We have developed teaching “modules” to help elucidate this process from the ground up. |
Kate Sanborn Invited Speaker North Carolina State University |
Breakout | 2020 | |||||||||
Graph link prediction in computer networks using Poisson Matrix Factorisation (Abstract)
Graph link prediction is an important task in cyber-security: relationships between entities within a computer network, such as users interacting with computers, or clients connecting to servers, can provide key insights into adversary behaviour. Poisson matrix factorization (PMF) is a popular model for link prediction in large networks, particularly useful for its scalability. An extension to PMF to include scenarios that are commonly encountered in cyber-security applications are presented. Specifically, an extension is proposed to include known covariates associated with the graph nodes as well as a seasonal variation to handle dynamic networks. |
Melissa Turcotte Research Scientist Los Alamos National Laboratory |
Breakout | 2020 | |||||||||
Trusted Collaborative Autonomous Systems for Multi Domain Operations (Abstract)
Every service Chief in the DoD is urging that the key to winning future wars is having the ability to rapidly understand our enemies and integrate maneuver across domains of land, sea, air and space – what is commonly referred to as multi-domain operations (MDO). Each service has some notion of how they will accomplish this and what it means for the pace and resilience of US warfighting but there is no universally agreed upon method to get there or even to define what “there” is. However, as far back as 2011, the DoD recognized the strategic significance of managing the disparate missions of communication and control, strike, surveillance, navigation, and electronic warfare as a single, integrated electromagnetic spectrum (ES) “maneuver space” or new strategic domain to span the traditional warfighting domains. This ES domain will be managed across many platforms (manned and unmanned) with many unique ES enabled payloads across vast distances in land, sea, air and space. Only by extending these platforms some degree of autonomy and providing them with the ability to collaborate will we achieve dominance in the ES domain. In essence, collaborative autonomous software services will form the backbone of MDO. We are already seeing this play-out in a number of defense and commercial efforts and there is little doubt that we (and our adversaries) are already on the path. Using traditional DoD methods of test and evaluation may be the greatest obstacle to deploying these capabilities. The DevOps (compound of Development Operations) movement emerged in earnest in the early 2010’s as a way for large enterprise software service companies (e.g. Amazon, Google, Netflix) to continuously innovate and deliver new products to the market. It has since proliferated across the globe. Three major ideas underlying the DevOps culture are of interest to us: (1) breaking down monolithic autonomous software services into multiple loosely coupled “microservices;” (2) containerizing services for portability and baked in security; and (3) performing “continuous testing.” In this presentation, we will discuss how adopting widely available DevOps tools and methodologies will make verification and validation of collaborative autonomous systems faster, more robust and transparent and enable the DoD’s goal of achieving truly effective and scalable multi domain operations. |
Robert Murphey Principal Research Engineer Georgia Tech Research Institute |
Breakout |
![]() | 2020 | ||||||||
Practical Applications for Functional Data Analysis in T&E (Abstract)
Testing today’s complex systems often requires advanced statistical methods to properly characterize and optimize measures of performance as a function of input factors. One promising area to more precisely model system behavior when the response is a curve or function over several measured time units is Functional Data Analysis (FDA). Some input factors such as sensor data could also be functions rather than held constant as is often assumed over the duration of the test run. Recent enhancements in common statistical software programs used across DoD and NASA now make FDA much more accessible to the analytical test community. This presentation will address the fundamental principles, workflow, and interpretation of results from FDA using a designed experiment for an autonomous system as an example. Additionally, we will address how to use FDA to establish a level of equivalence for modeling & simulation verification & validation efforts. |
James Wisnowski Principal Consultant and Co-owner Adsurgo LLC |
Breakout | 2020 | |||||||||
D-Optimally Based Sequential Test Method for Ballistic Limit Testing (Abstract)
Ballistic limit testing of armor is testing in which a kinetic energy threat is shot at armor at varying velocities. The striking velocity and whether the threat completely penetrated or partially penetrated the armor is recorded. The probability of penetration is modeled as a function of velocity using a generalized linear model. The parameters of the model serve as inputs to MUVES which is a DoD software tool used to analyze weapon system vulnerability and munition lethality. Generally, the probability of penetration is assumed to be monotonically increasing with velocity. However, in cases in which there is a change in penetration mechanism, such as the shatter gap phenomena, the probability of penetration can no longer be assumed to be monotonically increasing and a more complex model is necessary. One such model was developed by Chang and Bodt to model the probability of penetration as a function of velocity over a velocity range in which there are two penetration mechanisms. This paper proposes a D-optimally based sequential shot selection method to efficiently select threat velocities during testing. Two cases are presented: the case in which the penetration mechanism for each shot is known (via high-speed or post shot x-ray) and the case in which the penetration mechanism is not known. This method may be used to support an improved evaluation of armor performance for cases in which there is a change in penetration mechanism. |
Leonard Lombardo Mathematician U.S. Army Aberdeen Test Center (bio)
Leonard currently serves is an analyst for the RAM/ILS Engineering and Analysis Division at the U.S. Army Aberdeen Test Center (ATC). At ATC, he is the lead analyst for both ballistic testing of helmets and fragmentation analysis. Previously, while on a developmental assignment at the U.S. Army Evaluation Center, he worked towards increasing the use of generalized linear models in ballistic limit testing. Since then, he has contributed towards the implementation of generalized linear models within the test center through test design and analysis. |
Breakout![]() |
![]() | 2020 | ||||||||
Improving Cyber Resiliency through the Application of Behavioral Science to Cybersecurity (Abstract)
The mission of the Human Behavior and Cybersecurity Capability at MITRE is to leverage human behavior to reduce cybersecurity risk using behavioral sciences to understand and strengthen the human firewall. The Capability consists of a team of experienced behavioral scientists who bring applied subject-matter-expertise in human behavior to cybersecurity challenges, with demonstrated successes and thought leadership in insider threat and usable security. This presentation will introduce the four focus areas of the Capability: Insider Threat; Usable Security and Technology Adoption; Cybersecurity Assessment and Exercise Support; and Cybersecurity Risk Perceptions and Awareness. We will then discuss the methods and metrics used to study human behavior within each of these focal areas. Insider Threat Assessment: Our insider threat behavioral risk assessments use qualitative research practices to elicit discussion and disclosure of risks from the perspective of potential insiders. MITRE’s Insider Threat Framework, developed to address the challenges experienced by the National Critical Infrastructure in classifying and assessing insider threats, is a data-driven framework that includes psycho-social and cyber-physical characteristics that could be common, observable indicators for insider attacks. The continuous approach to developing this framework includes consistently structuring, hand-coding, collating and analyzing a large dataset (5,000-10,000) of raw insider threat investigation case files shared directly from multiple organizations. The aggregated framework, but not the sensitive raw data, is shared with insider threat programs to operationalize and facilitate the identification, prevention, detection and mitigation of insider threats. Usable Security and Technology Adoption: Our goal of studying the human aspects of usable security technologies is to improve decision making and enhance technology adoption by users. We will present examples of how we have applied psychological principles and behavioral methods to design valid and reliable approaches to evaluate the feasibility of new security programs, products, and resources such as AI-enabled security technology. Cybersecurity Assessment and Exercise Support: Complex collaborations among cybersecurity teams are becoming increasingly important but can trigger new challenges that impact mission success. We assess, evaluate and train individuals and teams to improve knowledge among cyber professionals during cybersecurity exercises. Examples of our work include cognitive adaptability and exercise assessments, development of team collaboration and performance metrics, team sense-making, and cyber team resilience. Cybersecurity Risk Perceptions and Awareness: Perceptions of cybersecurity risks and threats, and resulting decisions about how to approach or mitigate them have the potential to impact the effectiveness of cybersecurity programs. We evaluate how people, processes and technology impact tactical and strategic risk-based decisions and apply behavioral concepts to inform the cybersecurity risk framework and awareness programs. |
Poornima Madhavan Principal Behavioral Scientist MITRE |
Breakout | 2020 | |||||||||
AI for Cyber (Abstract)
Recent developments in artificial intelligence have captured the popular imagination and the attention of governments and businesses across the world. From digital assistants, to smart homes, to self-driving cars, AI appears to be on the verge of taking over many parts of our daily lives. Meanwhile, as the world becomes more networked, industry, governments, and individuals are facing a growing array of cybersecurity threats. In this talk, we will discuss the intersection of artificial intelligence, machine learning, and cybersecurity. In particular, we will look at how some popular machine learning methods will and will not change how we do cybersecurity, we will separate what in the AI and ML space is realistic from what is science fiction, and we will attempt to identify the true potential of ML to positively impact cybersecurity. |
Adam Cardinal-Stakenas Data Science Lead NSA |
Breakout | 2020 | |||||||||
Spectral Embedding and Cyber Networks (Abstract)
We are given a time series of graphs, for example those defined by connections between computers in network flows. Several questions are relevant to cyber security: 1) What are the natural groupings (clustering) of the computers on the network, and how do these evolve in time? 2) Is the graph “abnormal” for this day/time and hence indicative of a large scale problem? 3) Are certain nodes acting “abnormally” or “suspiciously”? 4) Given that some computers cannot be uniquely resolved (due to various types of dynamic IP address assignments), can we pair a newly observed computer with it’s previous instantiation in an earlier hour? In this talk, I will give a very brief introduction to some spectral graph methods that have shown promise for answering some of these questions, and present some preliminary results. These methods are much more widely applicable, and if time permits I will discuss some of the areas to which they are currently being applied. |
David Marchette Principal Scientist Naval Surface Warfare Center, Dahlgren Division |
Breakout | 2020 | |||||||||
Generalized Linear Multilevel Model Applied to a Forced-Choice Psychoacoustic Test (Abstract)
Linear regression is often extended to either (1) multilevel models in which response data cannot be assumed independent, e.g., nested data, or (2) generalized linear models in which response data is not normally distributed. Applying both extensions to binary responses results in generalized linear multilevel models in which a sigmoid link function gives the relationship between the mean of the response variable and a linear combination of predictors. Such models are well-suited to analyze psychophysical experiments, in which fitted sigmoids give the relationship between physical stimulus level and the probability of a human perceptual response. In this work, a generalized linear multilevel model is applied to a three-alternative forced-choice psychoacoustic test in which human test subjects were asked to identify a sound signal presented at different levels relative to a background noise. Since the guessing rate at low stimulus levels must converge to 1/3, a custom link function is applied. In this test, the grouping variable is the subject, because within-subject responses are assumed to be more alike than between-subject responses. This leads to readily available information about population-level parameters, such as how auditory thresholds are distributed within the group, how steep the psychometric functions are and if the differences are statistically significant. The multilevel model also demonstrates the effect of shrinkage in which the partially-pooled regression parameters are closer to the population mean than parameters found by un-pooled analyses. |
Matthew Boucher Research Engineer NASA Langley Research Center |
Breakout | 2020 | |||||||||
A Path to Fielding Autonomous Systems (Abstract)
Challenges in adapting the system design process, including test and evaluation, to autonomous systems have been well documented. These include the inability to verify autonomous behaviors, designing systems for assurance, and dealing with adaptive systems post-fielding. This talk will briefly recap those challenges, and then propose a path toward overcoming these challenges in the case of Army ground autonomous systems. The path begins with unmanned systems designed for test. Training and experimentation within a safe infrastructure can then be used to specify system characteristics and inform decisions as to what behaviors should be autonomous. Assurance comes in the form of assurance arguments updated as the system technology and understanding increases. |
Craig Lennon Autonomy CoI TEVV co-lead CCDC Army Research Lab |
Breakout | 2020 | |||||||||
Finding a Cheater: Machine Learning and the Dice Game “Craps” (Abstract)
Machine learning is becoming increasingly imbedded in military systems, so how do we know if it’s working, or as the name implies learning? This presentation and demonstration is a thought experiment for tester’s to consider how to measure and test systems with imbedded machine learning. This presentation involves a physical demonstration of elements of the popular casino game craps. Craps is a fast-paced dice game that offers players the opportunity to try and beat the odds, which are stacked in the casino’s favor. But what if a player could flip the odds on the casino by cheating? Can the audience detect the cheater? How about a machine learning algorithm? The audience will be able to make their own guesses and compare them to A-Dell, a machine learning algorithm designed to find craps cheaters. |
Paul Johnson Scientific Advisor MCOTEA |
Breakout | 2020 | |||||||||
Dynamic Model Updating for Streaming Classification and Clustering (Abstract)
A common challenge in the cybersecurity realm is the proper handling of high-volume streaming data. Typically in this setting, analysts are restricted to techniques with computationally cheap model-fitting and prediction algorithms. In many situations, however, it would be beneficial to use more sophisticated techniques. In this talk, a general framework is proposed that adapts a broad family of statistical and machine learning techniques to the streaming setting. The techniques of interest are those that can generate computationally cheap predictions, but which require iterative model-fitting procedures. This broad family of techniques includes various clustering, classification, regression, and dimension reduction algorithms. We discuss applied and theoretical issues that arise when using these techniques for streaming data whose distribution is evolving over time. |
Alexander Foss Senior Statistician Sandia National Laboratories |
Breakout | 2020 | |||||||||
A Validation Case Study: The Environment Centric Weapons Analysis Facility (Abstract)
Reliable modeling and simulation (M&S) allows the undersea warfare community to understand torpedo performance in scenarios that could never be created in live testing, and do so for a fraction of the cost of an in-water test. The Navy hopes to use the Environment Centric Weapons Analysis Facility (ECWAF), a hardware-in-the-loop simulation, to predict torpedo effectiveness and supplement live operational testing. In order to trust the model’s results, the T&E community has applied rigorous statistical design of experiments techniques to both live and simulation testing. As part of ECWAF’s two-phased validation approach, we ran the M&S experiment with the legacy torpedo and developed an empirical emulator of the ECWAF using logistic regression. Comparing the emulator’s predictions to actual outcomes from live test events supported the test design for the upgraded torpedo. This talk overviews the ECWAF’s validation strategy, decisions that have put the ECWAF on a promising path, and the metrics used to quantify uncertainty. |
Elliot Bartis Research Staff Member IDA (bio)
Elliot Bartis is a research staff member at the Institute for Defense Analyses where he works on test and evaluation of undersea warfare systems such as torpedoes and torpedo countermeasures. Prior to coming to IDA, Elliot received his B.A. in physics from Carleton College and his Ph.D. in materials science and engineering from the University of Maryland in College Park. For his doctorate dissertation, he studied how cold plasma interacts with biomolecules and polymers. Elliot was introduced to model validation through his work on a torpedo simulation called the Environment Centric Weapons Analysis Facility. In 2019, Elliot and others involved in the MK 48 torpedo program received a Special Achievement Award from the International Test and Evaluation Association in part for their work on this simulation. Elliot lives in Falls Church, VA with his wife Jacqueline and their cat Lily. |
Breakout![]() |
![]() | 2020 | ||||||||
The Role of Uncertainty Quantification in Machine Learning (Abstract)
Uncertainty is an inherent, yet often under-appreciated, component of machine learning and statistical modeling. Data-driven modeling often begins with noisy data from error-prone sensors collected under conditions for which no ground-truth can be ascertained. Analysis then continues with modeling techniques that rely on a myriad of design decisions and tunable parameters. The resulting models often provide demonstrably good performance, yet they illustrate just one of many plausible representations of the data – each of which may make somewhat different predictions on new data. This talk provides an overview of recent, application-driven research at Sandia Labs that considers methods for (1) estimating the uncertainty in the predictions made by machine learning and statistical models, and (2) using the uncertainty information to improve both the model and downstream decision making. We begin by clarifying the data-driven uncertainty estimation task and identifying sources of uncertainty in machine learning. We then present results from applications in both supervised and unsupervised settings. Finally, we conclude with a summary of lessons learned and critical directions for future work. |
David Stracuzzi Research Scientist Sandia National Laboratories |
Breakout![]() |
![]() | 2020 | ||||||||
Data Analysis for Special Operations Selection Programs (Abstract)
This study assesses the relationship between psychometric screening of candidates for our partner special operations unit and successful completion of the unit’s candidate selection program. Improving the candidate selection program is our primary goal for this study. Our partner unit maintains a comprehensive database of summary information on previous candidates but has not yet conducted a robust analysis of candidate attributes. We sought to achieve this goal by using statistical methods to determine predictors associated with successful completion of the selection program. Our results suggest that we may identify predictors associated with success but may struggle in constructing an effective predictive model due to the inherent differences of candidates. Our predictors are scales from standardized psychometric evaluations administered by the selection program intended originally to identify candidates with psychopathologies. Our outcome is a binary variable indicating successful completion of the program. Analyzing the demographics of the candidate selection population is our secondary goal for this study. Our analysis helped our partner unit improve its selection program by identifying characteristics associated with success. Other members of the special operations community may generalize our results to improve their respective programs as well. We do not intend for units to use these results to draw definitive conclusions regarding the ability of a candidate to pass a selection program. |
Nicholas Cunningham Cadet United States Military Academy |
Breakout | 2020 | |||||||||
Statistical Analysis of a Transonic Aerodynamic Calibration (Abstract)
The Monte Carlo method of uncertainty analysis was used to characterize the uncertainty of tunnel conditions within the calibration of a wind tunnel. To calibrate the tunnel, a long static pipe, consisting of 444 static pressure ports, was used. Data from this calibration was used to generate the data needed to be implemented into the analysis. Monte Carlo analysis specifically looks at all potential possibilities using this generated data by propagating it through the equations for tunnel conditions. Any method of uncertainty analysis would normally encompass precision, bias, and fossilized uncertainty, however, for this particular analysis, precision uncertainty is assumed to be negligible. All remaining uncertainties were calculated using a sigma value of 2 with a 95% confidence level over a normal Gaussian distribution. At the end of the analysis, a comparison was done to see the effect of fossilized uncertainty on the overall uncertainty, which showed that this often overlooked portion of uncertainty does cause a noticeable change in the overall uncertainty of the value. With the consideration of fossilized uncertainty, the Monte Carlo method of analysis is a useful method for characterizing the uncertainty of tunnel conditions for a wind tunnel calibration. |
Lindsey Drone Data Engineer NASA Ames Research Center |
Breakout | 2020 | |||||||||
Statistical Engineering for Service Life Prediction of Polymers (Abstract)
Economically efficient selection of materials depends on knowledge of not just the immediate properties, but the durability of those properties. For example, when selecting building joint sealant, the initial properties are critical to successful design. These properties change over time and can result in failure in the application (buildings leak, glass falls). A NIST led industry consortium has a research focus on developing new measurement science to determine how the properties of the sealant change with environmental exposure. In this talk, the two-decade history of the NIST led effort will be examined through the lens of Statistical Engineering, specifically its 6 phases: (1) Identify the problem. (2) Provide structure. (3)Understand the context. (4) Develop a strategy. (5) Develop and execute tactics. (6) Identify and deploy a solution. Phases 5 and 6 will be the primary focus of this talk, but all of the phases will be discussed. The tactics of phase 5 were often themselves multi-month or year research problems. Our approach to predicting outdoor degradation based only on accelerated weathering in the laboratory has been revised and improved many times over several years. In phase 6, because of NIST’s unique mission of promoting U.S. innovation and industrial competitiveness, the focus has been outward on technology transfer and the advancement of test standards. This may differ from industry and other government agencies where the focus may be improvement of processes inside of the organization. |
Adam Pintar Mathematical Statistician National Institute of Standards and Technology (bio)
Adam Pintar is a Mathematical Statistician at the National Institute of Standards and Technology. He applies statistical methods and thinking to diverse application areas including Physics, Chemistry, Biology, Engineering, and more recently Social Science. He received a PhD in Statistics from Iowa State University. |
Breakout![]() |
![]() | 2020 | ||||||||
Automated Road Extraction from Satellite Images (Abstract)
In this presentation we will discuss a methodology to automatically extract road networks from aerial images using machine learning algorithms. There are many civilian applications for this technology, such as maintaining GPS maps, real-estate monitoring and disaster relief, but this presentation is aimed a military applications for analyzing remote sensing data. The algorithm we propose identifies road networks and classifies each road as either improved or unimproved. Implementing this system in a military context will require our model to work across a wide range of environments. We will discuss the effectiveness of the model in a military context. |
Trevor Parker Cadet United States Military Academy |
Breakout | 2020 | |||||||||
The Science of Trust of Autonomous Unmanned Systems (Abstract)
The world today is witnessing a significant investment in autonomy and artificial intelligence that most certainly will result in ever-increasing capabilities of unmanned systems. Driverless vehicles are a great example of systems that can make decisions and perform very complex actions. The reality though is that while it is well understood what these systems are doing, but not well at all ‘how’ the intelligence engines are generating decisions to accomplish those actions. Therein lies the underlying challenge of accomplishing formal test and evaluation of these systems and related, how to engender trust in their performance. This presentation will outline and define the problem space, discuss those challenges, and offer solution constructs. |
Reed Young Program Manager for Robotics and Autonomy Johns Hopkins University Applied Physics Laboratory |
Breakout![]() |
![]() | 2020 | ||||||||
The Role of Sensitivity Analysis in Evaluating the Credibility of Machine Learning (Abstract)
We discuss several shock-physics applications we are pursuing using deep neural network approaches. These range from equation-of-state (EOS) inference for simple flyer-plate impact experiments when velocity time series are observed to radiographic inversion of high-speed impact simulations with composite materials. We put forward a methodology to leverage privileged information available from high-fidelity simulations to train a deep network prior to application on experimental observation. Within this process several sources of uncertainty are active and we discuss ways to mitigate these by careful structuring of the simulated training set. Once a network is trained the credibility of its inference, beyond performance on a test set, must be assessed. Without robust feature detection tools available for deep neural networks we show that much can be gained by applying classical sensitivity analysis techniques to the trained network. We show some results of this sensitivity analysis for our physics applications and discuss the caveats and pitfalls that arise when applying sensitivity analysis to deep machine learning algorithms. |
Kyle Hickmann Scientist Los Alamos National Laboratory |
Breakout | 2020 | |||||||||
Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning (Abstract)
We study the problem of post-hoc calibration of machine learning classifiers. We introduce the following desiderata for uncertainty calibration: (a) accuracy-preserving, (b) data-efficient, and (c) high expressive power. We show that none of the existing methods satisfy all three requirements, and demonstrate how our proposed calibration strategies can help achieve dramatically better data efficiency and expressive power while provably preserving classification accuracy of the original classifier. When calibrating a 50-layer Wide ResNet on ImageNet classification task, the proposed strategies improve the expressivity of temperature scaling by 17% and data efficiency of isotonic regression by a factor of 258, while preserving classification accuracy. |
Jize Zhang Postdoctoral Research Staff Member Lawrence Livermore National Laboratory |
Breakout | 2020 | |||||||||
Process/Workflow-Oriented View for Decision-Support Systems (Abstract)
Across application domains, analysts are tasked with an untenable situation of manually completing a big data analysis of a mix of quantitative and qualitative information sets. Human decision-making requires that evidence gathered from sources such as experiments, engineering analysis, and expert judgment be transformed into an appropriate format and presentation style. Distillation and interpretation of multi-source data can be supported through tools or decision-support systems that include automated features to reduce the mental burden on human analysts. Analysts benefit from a data-informed support tool that provides the correct information, at the right time, to arrive at the correct solution under uncertainty. My research has coupled a process/workflow-oriented view with knowledge and skill elicitation techniques to predict information analysts need and how they interact with and transform that information. Thus, this data-informed approach allows for a mapping of process steps and tools that will benefit analysts. As the state of data as it exists today is only likely to grow, not diminish over time, an approach to efficiently organizing and interpreting the data is crucial. Ultimately, improved decision making is realized across an entire workflow that is sustainable across time. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. |
Nicole Murchison Systems Research and Analysis Sandia National Laboratories |
Breakout | 2020 | |||||||||
Uncertainty Quantification and Decomposition Methods for Risk-Sensitive Machine Learning (Abstract)
Machine learning methods have attracted a lot of research attention in scientific applications. However, model credibility remains a major challenge toward the reliable deployment of these models to the field in costly or risky use cases. In this talk, we explore uncertainty quantification techniques to asses the quality of neural network predictions in regression and classification problems. To understand model variability, we examine different sources of randomness associated with training samples, data observation order, weight initialization, dropout, and ensemble formations. Motivated by typical scientific computing applications, we assume a limited sample budget and suggest approaches for reporting and possibly reducing uncertainty. |
Ahmad Rushdi Member of Technical Staff Sandia National Laboratories |
Breakout | 2020 | |||||||||
Sequential Testing and Simulation Validation for Autonomous Systems (Abstract)
Autonomous systems expect to play a significant role in the next generation of DoD acquisition programs. New methods need to be developed and vetted, particularly for two groups we know well that will be facing the complexities of autonomy: a) test and evaluation, and b) modeling and simulation. For test and evaluation, statistical methods that are routinely and successfully applied throughout DoD need to be adapted to be most effective in autonomy, and some of our practices need to be stressed. One is sequential testing and analysis, which we illustrate to allow testers to learn and improve incrementally. The other group needing to rethink practices best for autonomy is the modeling and simulation. Proposed are some statistical methods appropriate for modeling and simulation validation for autonomous systems. We look forward to your comments and suggestions. |
Jim Simpson Principal JK Analytics |
Breakout![]() |
![]() | 2020 | ||||||||
Balancing Human & Virtual Decision Making for Identifying Fraudulent Credit Card Activity (Abstract)
According to Forbes, merchants in the United States lose approximately $190 billion annually to credit card fraud (Shaughnessy 2012). Nordstrom, a leading retailer in the fashion industry, alone incurs losses upwards of $100 million every year. While these losses hurt the company financially, the more pressing concern for Nordstrom is the negative impact on the customer. If fraud is incorrectly flagged and an account is unnecessarily frozen, a customer will be dissatisfied with their experience. Conversely, if legitimate fraud goes undetected, valued customers can experience monetary loss and lose faith in the company. To minimize losses while maximizing customer satisfaction, Nordstrom Card Services created a fraud detection machine that analyzes every transaction and will take one of four actions based on the transaction’s characteristics: approve, approve and send to a queue, decline and send to a queue, or decline. Once a transaction is sent to the queue, it is assessed by either a human analyst or a virtual analyst. Those analysts will determine if any further action must be taken within the customer’s account. This project focuses on how to appropriately assign human and virtual decision-makers to maximize accuracy when determining whether or not a credit card transaction is fraud. |
Rao Abdul Hannan Cadet United States Air Force Academy |
Breakout | 2020 | |||||||||
Integrating Systems Engineering into Test Strategy Development and Systems Evaluation (Abstract)
As defense systems become more complex, multi-domain, and interdependent, the problem arises: What is the best way to determine what we need to test, and how much testing is adequate? A methodology, based on systems engineering, was developed specifically for use in Live Fire Test and Evaluation (LFT&E); however, the process can be applied to operational or developmental testing as well. The use of Systems Engineering principles involves understanding the prioritized warfighter (user) needs, and applies a definition process that identifies critical test issues. The main goals of this methodology are to clearly define the system performance priorities, based on the critical mission risks if performance is insufficient, and then to generate various solutions that will mitigate those associated risks. The systems engineering process also helps develop a common language and framework so all parties can discuss tradeoffs between cost, schedule and performance. It also produces specific products for each step of the process that ensure that each step has been adequately addressed. Most importantly, the methodology should enable better communication among and within the program, test, and oversight teams by clarifying mission and test priorities, clarifying test objectives, evaluating risks from proposed testing and demonstrated performance, and reporting decision-quality information. The result of implementing the methodology is two-fold: It produces a test strategy that prioritizes testing based on the criticality and uncertainty of the system’s performance; and it guides the development of a system evaluation that clearly links test outcomes to overall mission risks. |
Charlie Middleton Test and Evaluation Expert OSD Scientific Test and Analysis Techniques Center of Excellence |
Breakout | 2020 | |||||||||
The Role of Statistical Engineering in Creating Solutions for Complex Opportunities (Abstract)
Statistical engineering is the art and science for addressing complex organizational opportunities with data. The span of statistical engineering ranges from the “problems that keep CEOs awake at night” to the analysts dealing with the results of the experimentation necessary for the success of their most current project. This talk introduces statistical engineering and its full spectrum of approaches to complex opportunities with data. The purpose of this talk is to set the stage for the two specific case studies that follow it. Too often, people lose sight of the big picture of statistical engineering by a too narrow focus on the specific case studies. Too many people walk away thinking “This is what I have been doing for years. It is simply good applied statistics.” These people fail to see what we can learn from each other through the sharing of our experiences to teach other people how to create solutions more efficiently and effectively. It is this big picture that is the focus of this talk. |
Geoff Vining Professor Virginia Tech (bio)
Geoff Vining is a Professor of Statistics at Virginia Tech, where from 1999 – 2006, he also was the department head. He holds an Honorary Doctor of Technology from Luleå University of Technology. He is an Honorary Member of the ASQ (the highest lifetime achievement award in the field of Quality), an Academician of the International Academy for Quality, a Fellow of the American Statistical Association (ASA), and an Elected Member of the International Statistical Institute. He is the Founding and Current Past-Chair of the International Statistical Engineering Association (ISEA). He is a founding member of the US DoD Science of Test Research Consortium. Dr. Vining won the 2010 Shewhart Medal, the ASQ career award given to the person who has demonstrated the most outstanding technical leadership in the field of modern quality control. He also received the 2015 Box Medal from the European Network for Business and Industrial Statistics (ENBIS). This medal recognizes a statistician who has remarkably contributed to the development and the application of statistical methods in European business and industry. In 2013, he received an Engineering Excellence Award from the NASA Engineering and Safety Center. He received the 2011 William G. Hunter Award from the ASQ Statistics Division for excellence in statistics as a communicator, consultant, educator, innovator, and integrator of statistics with other disciplines and an implementer who obtains meaningful results. Dr. Vining is the author of three textbooks. He is an internationally recognized expert in the use of experimental design for quality, productivity, and reliability improvement and in the application of statistical process control. He has extensive consulting experience, most recently with the U.S. Department of Defense through the Science of Test Research Consortium and with NASA. |
Breakout![]() |
![]() | 2020 | ||||||||
Connecting Software Reliability Growth Models to Software Defect Tracking (Abstract)
Co-Author: Melanie Luperon. Most software reliability growth models only track defect discovery. However, a practical concern is removal of high severity defects, yet defect removal is often assumed to occur instantaneously. More recently, several defect removal models have been formulated as differential equations in terms of the number of defects discovered but not yet resolved and the rate of resolution. The limitation of this approach is that it does not take into consideration data contained in a defect tracking database. This talk describes our recent efforts to analyze data from a NASA program. Two methods to model defect resolution are developed, namely (i) distributional and (ii) Markovian approaches. The distributional approach employs times between defect discovery and resolution to characterize the mean resolution time and derives a software defect resolution model from the corresponding software reliability growth model to track defect discovery. The Markovian approach develops a state model from the stages of the software defect lifecycle as well as a transition probability matrix and the distributions for each transition, providing a semi-Markov model. Both the distribution and Markovian approaches employ a censored estimation technique to identify the maximum likelihood estimates, in order to handle the case where some but not all of the defects discovered have been resolved. Furthermore, we apply a hypothesis test to determine if a first or second order Markov chain best characterizes the defect lifecycle. Our results indicate that a first order Markov chain was sufficient to describe the data considered and that the Markovian approach achieves modest improvements in predictive accuracy, suggesting that the simpler distributional approach may be sufficient to characterize the software defect resolution process during test. The practical inferences of such models include an estimate of the time required to discover and remove all defects. |
Lance Fiondella Associate Professor University of Massachusetts (bio)
Lance Fiondella is an associate professor of Electrical and Computer Engineering at the University of Massachusetts Dartmouth. He received his PhD (2012) in Computer Science and Engineering from the University of Connecticut. Dr. Fiondella’s papers have received eleven conference paper awards, including six with his students. His software and system reliability and security research has been funded by the DHS, NASA, Army Research Laboratory, Naval Air Warfare Center, and National Science Foundation, including a CAREER Award. |
Breakout![]() |
![]() | 2020 | ||||||||
The Challenge of Data for Predicting Human Errors (Abstract)
The field of human reliability analysis (HRA) seeks to predict sources of human error for safety-critical systems. Much of the dominant research in HRA was historically conducted for nuclear power, where regulations required risk models of hardware and humans to ensure the safe operation of nuclear power plants in the face of potential accident situations. A challenge for HRA is that the incidence of nuclear accidents has fortunately been extremely rare. Thus, actuarial data do not serve as a good source for informing predictions. Simulators make it possible to train performance for rare events, but data collection has largely focused on performance where human errors have occurred, omitting a denominator of good and bad performance that would be useful for prediction. As HRA has branched into new domains, there is the additional challenge to what extent those data that are available from primarily control room operations can be generalized to other types of human activities. To address these shortcomings, new data collection efforts are underway, focusing on (1) more comprehensive logging of operator performance in simulators, (2) development of new testbeds for collecting data, and (3) data mining of existing human performance data that were not originally intended for human error. In this talk, I’ll review underlying assumptions of HRA and map new data sources to solving data needs in new domains such defense and aerospace. |
Thomas Ulrich Human Factors & Reliability Associate Scientist Idaho National Laboratory |
Breakout | 2020 | |||||||||
Creating Insight into Joint Cognitive System Performance (Abstract)
The DOD’s ability to collect data is far outstripping its ability to convert that data into actionable information. This problem is not unique to the military. With vast improvements in our ability to collect data, many organizations are drowning in that data. As a result, the need for advanced algorithms (Artificial Intelligence, Machine Learning, etc.) to support human work has never been greater. However, algorithm performance and the performance of the larger work system, composed of human and machine agents, are not synonymous. In order to build and field high performance work systems that consistently provide positive mission impact, the S&T community must measure the performance of the entire Joint Cognitive System (JCS). Algorithm performance, by itself, is necessary but not sufficient for predicting the performance of these systems in the field. Better predicting JCS performance once a system is fielded requires modelling, or at least learning about, the humans’ cognitive work and how the proposed technology will change that work. For example, providing practitioners with a new Machine Learning enabled support tool may impose unintended cognitive work as practitioners calibrate themselves to the performance envelope of the new tools. In addition, the practitioners’ cognitive work is shaped by many “soft constraints” which are often not captured by technologists. For example, real world time constraints may lead practitioners to use simple decision making heuristics, rather than deliberative hypothesis testing, when making critical decisions. The technology requirements needed to support both deliberate and heuristic decision making may be very different. This talk will discuss several approaches to gain insight on JCS performance as part of a larger cycle of iteratively discovering, building, and testing these systems. Additionally, this talk will give an overview of a small pilot study conducted by the Air Force Research Lab to measure the impact of a simulated Machine Learning agent designed to support AF intelligence analysts exploiting Full Motion Video data. |
Taylor Murphy Cognitive Systems Engineer Air Force Research Laboratory |
Breakout | 2020 | |||||||||
Introduction to Structural Equation Modeling: Implications for Human-System Interactions (Abstract)
Structural Equation Modeling (SEM) is an analytical framework that offers unique opportunities for investigating human-system interactions. SEM is used heavily in the social and behavioral sciences, where emphasis is placed on (1) explanation rather than prediction, and (2) measuring variables that are not observed directly. The framework facilitates modeling of survey data through confirmatory factor analysis and latent (i.e., unobserved) variable regression models. We provide a general introduction to SEM by describing what it is, the unique features it offers to analysts and researchers, and how it is easily implemented in JMP Pro 15.1. The introduction relies on a fun example everyone can relate to. Then, we shed light on a few published studies that have used SEM to unveil insights on human performance factors and the mechanisms by which performance is affected. The key goal of this presentation is to provide general exposure to a modeling tool that is likely new to most in the fields of defense and aerospace. |
Laura Castro-Schilo Research Statistician Developer JMP Division, SAS Institute, Inc. |
Breakout | 2020 | |||||||||
Optimal Designs for Multiple Response Distributions (Abstract)
Having multiple objectives is common is experimental design. However, a design that is optimal for a normal response can be very different from a design that is optimized for a nonnormal response. This application uses a weighted optimality criterion to identify an optimal design with continuous factors for three different response distributions. Both linear and nonlinear models are incorporated with normal, binomial and Poisson response variables. A JMP script employs a coordinate exchange algorithm that seeks to identify a design that is useful for all three of these responses. The impact from varying the prior distributions on the nonlinear parameters as well as changing the weights on the responses in the criterion is considered. |
Brittany Fischer Arizona State University |
Breakout | 2020 | |||||||||
Orbital position in space: What is Truth? A study comparing two astrodynamics systems (Abstract)
The Air Force maintains a space catalog of orbital information on tens-of-thousands of space objects, including active satellites and satellite debris. They also computationally project where they expect objects to be in the future. Each day, the Air Force issues warnings to satellite owner-operators about potential conjunctions (space objects passing near each other), which often results in one or both of the satellites maneuvering (if possible) for safety of flight. This problem grows worse as mega-constellations, such as SpaceX’s Starlink, are launched. |
Jason Sheldon Research Staff Member IDA |
Breakout | 2020 | |||||||||
Machine Learning Reveals that Russian IRA’s Twitter Topic Patterns Evolved over Time (Abstract)
Introduction: Information Operations (IO) are a key component of our adversaries’ strategy to undermine U.S. military power without escalating to more traditional (and more easily identifiable) military strikes. Social media activity is one method of IO. In 2017 and 2018, Twitter suspended thousands of accounts likely belonging to the Kremlin-backed Internet Research Agency (IRA). Clemson University archived a large subset of these tweets (2.9M tweets posted by over 2800 IRA accounts), tagged each tweet with metadata (date, time, language, supposed geographical region, number of followers, etc.), and published this dataset on the polling aggregation website FiveThirtyEight. |
Emily Parrish Research Associate IDA |
Breakout | 2020 | |||||||||
Combining Physics-Based Simulation & Machine Learning for Fast Uncertainty Quantification (Abstract)
With the rise of machine learning and artificial intelligence, there has been a huge surge in data-driven approaches to solve computational science and engineering problems. In the context of uncertainty quantification (UQ), a common use case for machine learning is in the construction of efficient surrogate models (i.e., response surfaces) to replace expensive, physics-based simulations. However, relying solely on data-driven models for UQ without any further recourse to the original high-fidelity simulation will generally produce biased estimators and can yield unreliable or non-physical results, especially when training data is sparse or predictions are required outside of the training data domain. |
James Warner Computational Scientist NASA Langley Research Center |
Breakout | 2020 | |||||||||
Expanding the Systems Test Engineer’s Toolbox A Multi-Criteria Decision Analysis Technique using the One-Sided Tolerance Interval for Analysis of DOE Planned Tests (Abstract)
Co-Authors: Luis Cortes (MITRE) and Alethea Duhon (Air Force Agency for Modeling and Simulation) This talk highlights the advantage of a multi-criteria decision analysis (MCDA) technique that utilizes the one-sided tolerance interval (OSTI) for the analysis of response data resulting from design of experiments (DOE) structured testing. Tolerance Intervals (TI) can be more intuitive for representing data and, when combined with DOE structured tests results, are excellent for providing decision quality information. The use of statistical techniques in planning provides a rigorous approach for the analysis of test data, an ideal input for a MCDA technique. This article’s findings demonstrate the value of constructing the OSTI for the interpretation of DOE structured tests. We also demonstrate the utility of using an optimized model combined with the OSTI as part of a MCDA technique for choosing between alternatives, which offers an efficient and powerful method to sift through test results rapidly and efficiently. This technique provides an alternative for analyzing data across multiple alternatives when traditional analysis techniques, such as the descriptive statistics (including the Confidence Interval [CI]), potentially obscure information from the untrained eye. Finally, the technique provides a level playing field–a critical feature given today’s acquisition protest culture. |
Michael Sheehan Principal Engineer MITRE |
Breakout | 2020 | |||||||||
Design Fractals: Visualizing the Coverage Properties of Covering Arrays (Abstract)
The challenge of identifying test cases that maximize the chance of discovering faults at minimal cost is a challenge that software test engineers constantly face. Combinatorial testing is an effective test case selection strategy to address this challenge. The idea is to select test cases that ensure that all possible combinations of settings from two (or more) inputs are accounted for, regardless of which subset of inputs are selected. This is accomplished by using a covering array as the test case selection mechanism. However, for any given testing scenario, there are usually several alternative covering arrays that may satisfy budgetary and coverage requirements. Unfortunately, it is often unclear how to choose from these alternatives. Practitioners often want a way to explore how the input space is being covered before deciding which alternative is best suited for the testing scenario. In this presentation we will provide an overview of a new graphical method that may be used to visualize the input space of covering arrays. We use the phrase “design fractals” to refer to this graphical method. |
Joseph Morgan Research Statistician SAS Institute Inc |
Breakout | 2020 | |||||||||
Resilience and Productive Safety Considerations for In-Time System-Wide Safety Assurance (Abstract)
Authors: Lawrence J. Prinzel III, Jon B. Holbrook, Kyle E. Ellis, Chad L. Stephens New innovative technologies and operational concepts will be required to meet the ever increasing global demands on air transportation. The NASA System-Wide Safety (SWS) project is focused on how future aviation advances can meet demand needs while maintaining today’s ultra-safe system safety levels. Aviation safety as it evolves shall require new ways of thinking about safety, integrating a wide-range of existing and new safety systems and practices, creating and enhancing tools and technologies, leveraging the access to system-wide data and data fusion, improving data analysis capabilities, and developing new methods for in-time risk monitoring and detection, hazard prioritization and mitigation, safety assurance decision-support, and in-time integrated system analytics [NRC, 2018]. To meet these needs, the SWS project has developed research priorities including In-time System-wide Safety Assurance (ISSA) and development of In-time Aviation Safety Management System (IASMS) [Ellis et al., 2019]. As part of this effort, the concepts of “resilience†and “productive safety†are being studied. Traditional approaches to aviation safety have focused on what can go wrong and how to prevent it. Another approach to thinking about system safety should reflect not only “avoiding things that go wrong†(protective safety) but also “ensuring that things go right†(productive safety) that together enables a system to exhibit resilience. On-going SWS research is focused on application of these concepts for ISSA and design of IASMS. NASA identified significant challenges and research needs for ISSA and IASMS for Urban Air Mobility (UAM) [NRC, 2018] [Ellis et al., 2019]. UAM is an emerging concept of operation that features small aircraft providing on-demand transportation over relatively shorter distances within urban areas. The SWS project has focused on development of UAM-domain safety monitoring and alerting tools, integrated predictive technologies for UAM-level application, and adaptive in-time safety threat management [Ellis et al., 2019]. Significant research challenges include how to identify data sources and indicators for in-time safety critical risks, how to analyze those data to detect and prioritize risks, and how to optimize safety awareness and safety action decision support. Because UAM is also being used to evaluate the safety paradigm of “work-as-imagined†– characterizing how people think their work is done in comparison to how work is actually done as they are all too often not the same. The challenges associated with ISSA and development of IASMS are significant even for existing air transportation system operations where work-as-imagined and work-as-done can actually be compared. However, because UAM currently exists only as work-as-imagined, the safety challenges are far greater for meeting ISSA needs and design of IASMS. The present proposal shall discuss resilience and productive safety considerations for in-time safety assurance and safety management systems for UAM. Topics include challenges of collecting productive safety in-time data, granularity of data types and measurement, the need for new analytical methods, issues for identifying in-time productive safety metrics and indicators, and potential approaches toward quantification of resilience indices. Recommendations and future research directions shall also be described. |
Lawrence J. Prinzel III Senior Aerospace Research Engineer NASA Langley Research Center |
Breakout | 2020 | |||||||||
Physics-Informed Deep Learning for Modeling and Simulation under Uncertainty (Abstract)
Recently, a Department of Energy (DOE) report was released on the concept of scientic machine learning (SML), which is broadly dened as “a computational tech- nology that can be trained, with scientic data, to augment or automate human skills.” [1] As the demand for machine learning (ML) in science and engineering rapidly in- creases, it is important to have condence that the output of the ML algorithm is representative of the phenomena, processes, or physics being modeled. This is espe- cially important in high-stakes elds such as defense and aerospace. In the DOE report, three research themes were highlighted with the aim of providing condence in ML im- plementations. In particular, ML algorithms should be domain-aware, interpretable, and robust. Deep learning has become a ubiquitous term over the past decade due to its ability to model high-dimensional complex processes, but domain awareness, interpretability, and robustness in these large neural networks (NNs) are often hard to achieve. Recent advances in physics-informed neural networks (PINNs) are promising in that they can provide both domain awareness and a degree of interpretability [2, 3, 4]. These al- gorithms take advantage of the breadth of scientic knowledge built over centuries by fusing governing partial dierential equations into the NN training process. In this way, PINNs output physically admissible solutions. However, PINNs are generally deter- ministic, meaning interpretability and robustness suffer as it is unclear how uncertainty affects the model. Another noteworthy deep learning algorithm is the generative adversarial network (GAN). GANs are capable of modeling probability distributions in both forward and inverse problems, and thus have received a ood of inerest with over 15,000 citations of the seminal paper [5] in six years. A natural next step is to combine both PINNs and GANs to address all three themes laid out in [1]. The resultant physics-informed GAN (PI-GAN) is capable of both modeling physical processes and simultaneously quantifying uncertainty. A limited number of works have already demonstrated the success of PI-GANs [6, 7]. This talk will present an introduction to PI-GANs as well as an example of current NASA research implementing these networks. |
Patrick Leser NASA Langley Research Center |
Breakout | 2020 | |||||||||
Development of Predictive Models for Brain-Computer Interface Systems: A Case Study (Abstract)
Recent advances in brain computer interfaces (BCI) has brought interest in exploring the possibility of BCI like technology for future armament systems. In an experiment, conducted at the Tactical Behavioral Lab (TBRL), electroencephalogram (EEG) data was collected during simulated engagement scenarios in order to study the relationship between a soldier’s state of mind and his biophysical signals. One of the goals was to determine if it was possible to anticipate the decision to fire a gun. The nature of EEG data presents a unique set of challenges. For example, the high sensitivity of EEG electrodes coupled with their close proximity on the scalp means recorded data is noisy and highly correlated. Special attention needs to be payed to data pre-processing and feature engineering in building predictive models. |
Kevin Eng Statistician US Army CCDC Armaments Center |
Breakout | 2020 | |||||||||
Employing Design of Experiments (DOE) in an Electronic Warfare Test Strategy/Design (Abstract)
Electronic warfare systems generate diverse signals in a complex environment. This briefing covers how a particular system employed DOE to develop the overall strategy, determined which tests would utilize DOE, and generated specific designs. It will cover the technical and resource challenges encountered throughout the process leading up to testing. We will discuss the difficulty in defining responses and factors and ideas on how to resolve these issues. Lastly we will cover impacts from shortened schedules and how DOE enabled a quantitative risk assessment. |
Michael Harman Statistical Test Designer STAT COE |
Breakout |
![]() | 2020 | ||||||||
Bayesian Logistic Regression with Separated Data (Abstract)
When analyzing binary responses, logistic regression is sometimes difficult when some scenarios tested result in only successes or failures; this is called separation in the data. Using typical frequentist methods, logistic regression often fails because of separation. However, Bayesian logistic regression does not suffer from this limitation. This talk will walk through a Bayesian logistic regression of data with separation using the R brms package. |
Jason Martin Test Design and Analysis Lead U.S. Army CCDC Aviation and Missile Center |
Breakout | 2020 | |||||||||
Natural Language Processing for Safety-Critical Requirements (Abstract)
Requirements specification flaws are still the biggest contributing factor to most accidents related to software. Most NASA projects have safety-critical requirements that, if implemented incorrectly, could lead to serious safety implications and/or mission-ending scenarios. There are normally thousands of system-/subsystem-/component-level requirements that need to be analyzed for safety criticality early in the project development life cycle. Manually processing such requirements is typically time-consuming and prone to error. To address this, we implemented and tested text classification models to identify requirements that are safety-critical within project documentation. We found that a naïve Bayes classifier was able to identify all safety-critical requirements with an average false positive rate of 41.35%. Future models trained on larger project requirement datasets may achieve even better performance, reducing the burden of processing requirements on safety and mission assurance personnel and improving the safety of NASA projects. |
Ying Shi Safety and Mission Assurance NASA GSFC |
Breakout | 2020 | |||||||||
A HellerVVA Problem: The Catch-22 for Simulated Testing of Fully Autonomous Systems (Abstract)
In order to verify, validate, and accredit (VV&A) a simulation environment for testing the performance of an autonomous system, testers must examine more than just sensor physics—they must also provide evidence that the environmental features which drives system decision making are represented at all. When systems are black boxes though, these features are fundamentally unknown, necessitating that we first test to discover these features. An umbrella known as “model induction” provides approaches for demystifying black boxes and obtaining models of their decision making, but the current state of the art assumes testers can input large quantities of operationally relevant data. When systems only make passive perceptual decisions or operate in purely virtual environments, these assumptions are typically met. However, this will not be the case for black-box, fully autonomous systems. These systems can make decisions about the information they acquire—which cannot be changed in pre-recorded passive inputs—and a major reason to obtain a decision model is to VV&A the simulation environment—preventing the valid use of a virtual environment to obtain a model. Furthermore, the current consensus is that simulation will be used to get limited safety releases for live testing. This creates a catch-22 of needing data to obtain the decision-model, but needing the decision-model to validly obtain the data. In this talk, we provide a brief overview of this challenge and possible solutions. |
Daniel Porter Research Staff Member IDA |
Breakout![]() |
![]() | 2020 | ||||||||
An end-to-end uncertainty quantification framework in predictive ocean data science (Abstract)
Because of the formidable challenge of observing the time-evolving full-depth global ocean circulation, numerical simulations play an essential role in quantifying the ocean’s role in climate variability and long-term change. For the same reason, predictive capabilities are confounded by the high-dimensional space of uncertain variables (initial conditions, internal parameters, external forcings, and model inadequacy). Bayesian inverse methods (loosely known in approximate form as data assimilation) that optimally extract and merge information from sparse, heterogeneous observations and models are powerful tools to enable rigorously calibrated and initialized predictive models to optimally learn from the sparse data. A key enabling computational approach is the use of derivative (adjoint and Hessian) methods for solving a deterministic nonlinear least-squares optimization problem. Such a parameter and state estimation system is practiced by the NASA-supported Estimating the Circulation and Climate of the Ocean (ECCO) consortium. An end-to-end system that propagates uncertainties from observational data to relevant oceanographic metrics or quantities of interest within an inverse modeling framework should address, within a joint approach, all sources of uncertainties, including those in (1) observations (measurement, sampling and representation errors), (2) the model (parametric and structural model error), (3) the data assimilation method (algorithmic approximations and data ingestion), (4) initial and boundary conditions (external forcings, bathymetry), and (5) the prior knowledge (error covariances). Here we lay out a vision for such an end-to-end framework. Recent developments in computational science and engineering are beginning to render theoretical concepts practical in real-world applications. |
Patrick Heimbach Associate Professor University of Texas at Austin |
Breakout | 2020 | |||||||||
Aliased Informed Model Selection for Nonregular Designs (Abstract)
Nonregular designs are a preferable alternative to regular resolution IV designs because they avoid confounding two-factor interactions. As a result nonregular designs can estimate and identify a few active two-factor interactions. However, due to the sometimes complex alias structure of nonregular designs, standard screening strategies can fail to identify all active effects. In this talk, two-level nonregular screening designs with orthogonal main effects will be discussed. By utilizing knowledge of the alias structure, we propose a design based model selection process for analyzing nonregular designs. Our Aliased Informed Model Selection (AIMS) strategy is a design specific approach that is compared to three generic model selection methods; stepwise regression, Lasso, and the Dantzig selector. The AIMS approach substantially increases the power to detect active main effects and two-factor interactions versus the aforementioned generic methodologies. |
Carly Metcalfe PhD Candidate Arizona State University |
Breakout | 2020 | |||||||||
International journal of astrobiology (Abstract)
The Europa Clipper mission must comply with the NASA Planetary Protection requirement in NASA Procedural Requirement (NPR) 8020.12D, which states: “The probability of inadvertent contamination of an ocean or other liquid water body must be less than 1×10-4 per mission”. Mathematical approaches designed to assess compliance with this requirement have been offered in the past, but no accepted methodology was in-place to trace the end-to-end probability of contamination: from a terrestrial microorganism surviving in a non-terrestrial ocean, to the impact scenario that put them there, to potential flight system failures that led to impact, back to the initial bioburden launched with the spacecraft. As a result, hardware could presumably be either over- or under-cleaned. Over-specified microbial reduction protocols can greatly add to the cost (and schedule) of a project. On the other hand, if microbes on hardware are not sufficiently eliminated, there is increased risk of potentially contaminating another body with terrestrial organisms – adversely affecting scientific exploration and possibly conflicting with international treaty. The anticipated Mars Sample Return Campaign would be subject to a similar challenge regarding returning Martian material to Earth. A proposed requirement is “The MSR campaign shall have a probability of releasing unsterilized Martian particles, with diameters ≥ 50 nanometers (TBC), into Earth’s biosphere ≤ 1×10-6”. A similar question arises: what is required in terms of ensuring sterilization or containing Martian particles in order to meet the requirement? The mathematical framework and other interesting sensitivities that the analysis has revealed for each mission are discussed. |
Kelli Mccoy Mars Sample Return Campaign Risk Manager JPL |
Breakout | 2020 | |||||||||
Dashboard for Equipment Failure Reports (Abstract)
Equipment Failure Reports (EFRs) describe equipment failures and the steps taken as a result of these failures. EFRs contain both structured and unstructured data. Currently, analysts manually read through EFRs to understand failure modes and make recommendations to reduce future failures. This is a tedious process where important trends and information can get lost. This motivated the creation of an interactive dashboard that extracts relevant information from the unstructured (i.e. free-form text) data and combines it with structured data like failure date, corrective action and part number. The dashboard is an RShiny application that utilizes numerous text mining and visualization packages, including tm, plotly, edgebundler, and topicmodels. It allows the end-user to filter to the EFRs that they care about and visualize meta-data, such as geographic region where the failure occurred, over time allowing previously unknown trends to be seen. The dashboard also applies topic modeling to the unstructured data to identify key themes. Analysts are now able to quickly identify frequent failure modes and look at time and region-based trends in these common equipment failures. DISTRIBUTION STATEMENT A – APPROVED FOR PUBLIC RELEASE; DISTRIBUTION IS UNLIMITED. |
Cole Molloy Johns Hopkins Applied Physics Lab |
Breakout | 2020 | |||||||||
Designing Competitions that Reward Robust Algorithm Performance (Abstract)
Supervised learning competitions such as those hosted by Kaggle provide a quantitative, objective way to evaluate and compare algorithms. However, the typical format of evaluating competitors against a fixed set of data can encourage overfitting to that particular set. If the competition host is interested in motivating development of approaches that will perform well against a more general “in the wild” problem outside of the competition, the winning algorithms of these competitions might fall short because they’re tuned too closely to the competition data. Furthermore, the idea of having a single final ranking of the competitors based on the competition data set ignores the possibility that that ranking might change substantially if a different data set were used, even one that has the same statistical characteristics as the original. We present an approach for designing training and test sets that reward more robust algorithms and discourage overfitting with the intent of improved performance for scenarios beyond the competition. These carefully designed sets also enable a rich and nuanced analysis and comparison of the performance of competing algorithms, including a more flexible final ranking. We illustrate these methods with two competitions recently designed and hosted by Los Alamos National Laboratory to improve detection, identification, and location of radiological threats in urban environments. |
Kary Myers Deputy Group Leader, Statistical Sciences Group Los Alamos National Laboratory |
Breakout | 2020 | |||||||||
Post-hoc Uncertainty Quantification for Remote Sensing Observing Systems (Abstract)
The ability of spaceborne remote sensing data to address important Earth and climate science problems rests crucially on how well the underlying geophysical quantities can be inferred from these observations. Remote sensing instruments measure parts of the electromagnetic spectrum and use computational algorithms to infer the unobserved true physical states. However, the accompanying uncertainties, if they are provided at all, are usually incomplete. There are many reasons why including but not limited to unknown physics, computational artifacts and compromises, unknown uncertainties in the inputs, and more. In this talk I will describe a practical methodology for uncertainty quantification of physical state estimates derived from remote sensing observing systems. The method we propose combines Monte Carlo simulation experiments with statistical modeling to approximate conditional distributions of unknown true states given point estimates produced by imperfect operational algorithms. Our procedure is carried out post-hoc; that is, after the operational processing step because it is not feasible to redesign and rerun operational code. I demonstrate the procedure using four months of data from NASA’s Orbiting Carbon Observatory-2 mission, and compare our results to those obtained by validation against data from the Total Carbon Column Observing Network where it exists. |
Amy Braverman Principal Statistician Jet Propulsion Laboratory, California Institute of Technology |
Breakout | 2020 | |||||||||
Bayesian Experimental Design to augment a sensor network (Abstract)
The problem of augmenting an existing sensor network can be solved by considering how best to answer the question of interest. In nuclear explosion monitoring, this could include where best to place a new seismic sensor to most accurately and precisely predict the unknown location of a seismic event. In this talk, we will solve this problem using a Bayesian Design of Experiments (DoE) approach where each design consists of the existing sensor network coupled with a new, possible location. We will incorporate complex computer simulation and experimental data to rank the possible sensor locations with respect to their ability to provide accurate and precise event location estimates. This will result in a map of desirability for new locations, so that if the most highly ranked location is not available or possible, regions of similar predictability can be identified. The Bayesian DoE approach also allows for incorporating denied access locations due to either geographical or political constraints. |
Emily Casleton Staff Scientist Los Alamos National Laboratory |
Breakout | 2020 | |||||||||
Calibrate, Emulate, Sample (Abstract)
The calibration of complex models to data is both a challenge and an opportunity. It can be posed as an Inverse Problem. This work focuses on the interface of Ensemble Kalman algorithms used for inversion or posterior sampling (EKI/EKS) , Gaussian process emulation (GPE) and Markov Chain Monte Carlo (MCMC) for the calibration of, and quantification of uncertainty in, parameters learned from data. The goal is to perform uncertainty quantification in predictions made from complex models, reflecting uncertainty in these parameters, with relatively few computationally expensive forward model evaluations. This is achieved by propagating approximate posterior samples obtained by judicious combination of ideas from EKI/EKS, GPE and MCMC. The strategy will be illustrated with idealized models related to climate modeling. |
Alfredo Garbuno-Inigo Postdoctoral Scholar Caltech |
Breakout | 2020 | |||||||||
Unlock Trends in Instrument Performance with Recurrence Studies and Text Analyses (Abstract)
Recurrence analysis models the frequency of recurrent events, such as break downs or repairs, to obtain the total repairs per unit as a function of time. Text analytics is used to extract useful summary information from maintenance records. The coupling of techniques results in dynamic, actionable information used to ensure optimal management of test instruments. This session provides a practical example of recurrence analyses and text analytics on data collected from multiple units of chromatography instrumentation. Analyses include the liberal use of data visualization and practical interpretation of results giving attendees enlightenment intended to extract the best performance from test equipment. Testing and evaluation typically involves the use of sophisticated instrumentation. The information contributed from instruments must be precise, accurate, and reliable due to critical decisions that are made from the data. Regular repairs and maintenance are needed to ensure robust instruments that operate properly. Recurrence analysis is used to obtain a mean cumulative function (MCF) to better understand instrument performance over time. The MCF can be used to estimate maintenance costs, explain repair tendencies, and compare units. Test instrumentation typically includes a large amount of documentation within use logs and maintenance records. Text analytics is used to quickly summarizes documented information into word clouds, document term matrices, or clusters for enhanced understanding. Common themes that arise from text analytics help to focus preventive maintenance and to ensure that the most needed parts and resources are available to mitigate testing delays. Latent semantic analyses enhance the information to better understand the topic vectors present within the documents. The dynamic linking of techniques provides for optimal understanding of the performance of test instrumentation over the life of the equipment. The results of the combined analyses allow for data-driven maintenance planning, sound justification for instrument replacement, and the assurance of robust test results. |
Rob Lievense Systems Engineer SAS/JMP |
Breakout | 2020 | |||||||||
Bayesian Calibration and Uncertainty Analysis: A Case Study Using a 2-D CFD Turbulence Model (Abstract)
The growing use of simulations in the engineering design process promises to reduce the need for extensive physical testing, decreasing both development time and cost. However, as mathematician and statistician George E. P. Box said, “Essentially, all models are wrong, but some are useful.” There are many factors that determine simulation or, more broadly, model accuracy. These factors can be condensed into noise, bias, parameter uncertainty, and model form uncertainty. To counter these effects and ensure that models faithfully match reality to the extent required, simulation models must be calibrated to physical measurements. Further, the models must be validated, and their accuracy must be quantified before they can be relied on in lieu of physical testing. Bayesian calibration provides a solution for both requirements: it optimizes tuning of model parameters to improve simulation accuracy, and estimates any remaining discrepancy which is useful for model diagnosis and validation. Also, because model discrepancy is assumed to exist in this framework, it enables robust calibration even for inaccurate models. In this paper, we present a case study to investigate the potential benefits of using Bayesian calibration, sensitivity analyses, and Monte Carlo analyses for model improvement and validation. We will calibrate a 7-parameter k-𝜎 CFD turbulence model simulated in COMSOL Multiphysics®. The model predicts coefficient of lift and drag for an airfoil defined using a 6049-series airfoil parameterization from the National Advisory Committee for Aeronautics (NACA). We will calibrate model predictions using publicly available wind tunnel data from the University of Illinois Urbana-Champaign’s (UIUC) database. Bayesian model calibration requires intensive sampling of the simulation model to determine the most likely distribution of calibration parameters, which can be a large computational burden. We greatly reduce this burden by following a surrogate modeling approach, using Gaussian process emulators to mimic the CFD simulation. We train the emulator by sampling the simulation space using a Latin Hypercube (LHD) Design of Experiment (DOE), and assess the accuracy of the emulator using leave-oneout Cross Validation (CV) error. The Bayesian calibration framework involves calculating the discrepancy between simulation results and physical test results. We also use Gaussian process emulators to model this discrepancy. The discrepancy emulator will be used as a tool for model validation; characteristic trends in residual errors after calibration can indicate underlying model form errors which were not addressed via tuning the model calibration parameters. In this way, we will separate and quantify model form uncertainty and parameter uncertainty. The results of a Bayesian calibration include a posterior distribution of calibration parameter values. These distributions will be sampled using Monte Carlo methods to generate model predictions, whereby new predictions have a distribution of values which reflects the uncertainty in the tuned calibrated parameter. The resulting output distributions will be compared against physical data and the uncalibrated model to assess the effects of the calibration and discrepancy model. We will also perform global, variance based sensitivity analysis on the uncalibrated model and the calibrated models, and investigate any changes in the sensitivity indices from uncalibrated to calibrated. |
Peter Chien | Contributed | 2018 | |||||||||
Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study (Abstract)
In today’s manufacturing, inspection, and testing world, understanding the capability of the measurement system being used via the use of Measurement Systems Analyses (MSA) is a crucial activity that provides the foundation for the use of Design of Experiments (DOE) and Statistical Process Control (SPC). Although undesirable, there are times when human observation is the only measurement system available. In these types of situations, traditional MSA tools are often ineffectual due to the nature of the data collected. When there are no other alternatives, we need some method for assessing the adequacy and effectiveness of the human observations. When multiple observers are involved, Attribute Agreement Analyses are a powerful tool for quantifying the Agreement and Effectiveness of a visual inspection system. This talk will outline best practices and rules of thumb for Attribute Agreement Analyses, and will highlight a recent Army case study to further demonstrate the tool’s use and potential. |
Christopher Drake Lead Statistician QE&SA Statistical Methods & Analysis Group |
Contributed | materials | 2018 | ||||||||
Leveraging Anomaly Detection for Aircraft System Health Data Stability Reporting (Abstract)
Detecting and diagnosing aircraft system health poses a unique challenge as system complexity increases and software is further integrated. Anomaly detection algorithms systematically highlight unusual patterns in large datasets and are a promising methodology for detecting aircraft system health. The F-35A fighter aircraft is driven by complex, integrated subsystems with both software and hardware components. The F-35A operational flight program is the software that manages each subsystem within the aircraft and the flow of required information and support between subsystems. This information and support are critical to the successful operation of many subsystems. For example, the radar system supplies information to the fusion engine, without which the fusion engine would fail. ACC operational testing can be thought of as equivalent to beta testing for operational flight programs. As in other software, many faults result in minimal loss of functionality and are often unnoticed by the user. However, there are times when a software fault might result in catastrophic functionality loss (i.e., subsystem shutdown). It is critical to catch software problems that will result in catastrophic functionality loss before the flight software is fielded to the combat air forces. Subsystem failures and degradations can be categorized and quantified using simple system health data codes (e.g., degrade, fail, healthy). However, because the integrated nature of the F-35A, a subsystem degradation may be caused by another subsystem. The 59th Test and Evaluation Squadron collects autonomous system data, pilot questionnaires, and health report codes for F-35A subsystems. Originally, this information was analyzed using spreadsheet tools (i.e., Microsoft Excel). Using this method, analysts were unable to examine all subsystems or attribute cause for subsystem faults. The 59 TES is developing a new process that leverages anomaly detection algorithms to isolate flights with unusual patterns of subsystem failures and within those flights, highlight what subsystem faults are correlated with increased subsystem failures. This presentation will compare the performance of several anomaly detection algorithms (e.g., K-means, K-nearest neighbors, support vector machines) using simulated F-35A data. |
Kyle Gartrell | Contributed | materials | 2018 | ||||||||
Application of Adaptive Sampling to Advance the Metamodeling and Uncertainty Quantification Process (Abstract)
Over the years the aerospace industry has continued to implement design of experiments and metamodeling (e.g., response surface methodology) in order to shift the knowledge curve forward in the systems design process. While the adoption of these methods is still incomplete across aerospace sub-disciplines, they comprise the state-of-the-art during systems design and for design evaluation using modeling and simulation or ground testing. In the context of modeling and simulation, while national infrastructure in high performance computing becomes higher performance, so do the demands placed on those resources in terms of simulation fidelity and number of researchers. Furthermore, with recent emphasis placed on the uncertainty quantification of aerospace system design performance, the number of simulation cases needed to properly characterize a system’s uncertainty across the entire design space increases by orders of magnitude, further stressing available resources. This leads to advanced development groups either sticking to ad hoc estimates of uncertainty (e.g., subject matter expert estimates based on experience) or neglecting uncertainty quantification all together. Advancing the state-of-the-art of aerospace systems design and evaluation requires a practical adaptive sampling scheme that responds to the characteristics of the underlying design or uncertainty space. For example, when refining a system metamodel gradually, points should be chosen for design variable combinations that are located in high curvature regions or where metamodel uncertainty is the greatest. The latter method can be implemented by defining a functional form of the metamodel variance and using it to define the next best point to sample. For schemes that require n points to be sampled simultaneously, considerations can be made to ensure proper sample dispersion. The implementation of adaptive sampling schemes to the design and evaluation process will enable similar fidelity with fewer samples of the design space compared to fixed or ad hoc sampling methods (i.e., shorter time or human resources required). Alternatively, the uncertainty of the design space can be reduced to a greater extent for the same number of samples or with fewer samples using higher fidelity simulations. The purpose of this presentation will be to examine the benefits of adaptive sampling as applied to challenging design problems. Emphasis will be placed on methods that are accessible to engineering |
Erik Axdahl Hypersonic Airbreathing Propulsion Branch NASA Langley Research Center |
Contributed | materials | 2018 | ||||||||
Application of Adaptive Sampling to Advance the Metamodeling and Uncertainty Quantification Process (Abstract)
Over the years the aerospace industry has continued to implement design of experiments and metamodeling (e.g., response surface methodology) in order to shift the knowledge curve forward in the systems design process. While the adoption of these methods is still incomplete across aerospace sub-disciplines, they comprise the state-of-the-art during systems design and for design evaluation using modeling and simulation or ground testing. In the context of modeling and simulation, while national infrastructure in high performance computing becomes higher performance, so do the demands placed on those resources in terms of simulation fidelity and number of researchers. Furthermore, with recent emphasis placed on the uncertainty quantification of aerospace system design performance, the number of simulation cases needed to properly characterize a system’s uncertainty across the entire design space increases by orders of magnitude, further stressing available resources. This leads to advanced development groups either sticking to ad hoc estimates of uncertainty (e.g., subject matter expert estimates based on experience) or neglecting uncertainty quantification all together. Advancing the state-of-the-art of aerospace systems design and evaluation requires a practical adaptive sampling scheme that responds to the characteristics of the underlying design or uncertainty space. For example, when refining a system metamodel gradually, points should be chosen for design variable combinations that are located in high curvature regions or where metamodel uncertainty is the greatest. The latter method can be implemented by defining a functional form of the metamodel variance and using it to define the next best point to sample. For schemes that require n points to be sampled simultaneously, considerations can be made to ensure proper sample dispersion. The implementation of adaptive sampling schemes to the design and evaluation process will enable similar fidelity with fewer samples of the design space compared to fixed or ad hoc sampling methods (i.e., shorter time or human resources required). Alternatively, the uncertainty of the design space can be reduced to a greater extent for the same number of samples or with fewer samples using higher fidelity simulations. The purpose of this presentation will be to examine the benefits of adaptive sampling as applied to challenging design problems. Emphasis will be placed on methods that are accessible to engineering |
Robert Baurle Hypersonic Airbreathing Propulsion Branch NASA Langley Research Center |
Contributed | materials | 2018 | ||||||||
Application of Statistical Methods and Designed Experiments to Development of Technical Requirements (Abstract)
The Army relies heavily on the voice of the customer to develop and refine technical requirements for developmental systems, but too often the approach is reactive. The ARDEC (Armament Research, Development & Engineering Center) Statistics Group at Picatinny Arsenal, NJ, working closely with subject matter experts, has been implementing market research and web development techniques and Design of Experiments (DOE) best practices to design and analyze surveys that provide insight into the customer’s perception of utility for various developmental commodities. Quality organizations tend to focus on ensuring products meet technical requirements, with far less of an emphasis placed on whether or not the specification actually captures customer needs. The employment of techniques and best practices spanning the fields of Market Research, Design of Experiments, and Web Development (choice design, conjoint analysis, contingency analysis, psychometric response scales, stratified random sampling) converge towards a more proactive and risk-mitigating approach to the development of technical and training requirements, and encourages strategic decision-making when faced with the inarticulate nature of human preference. Establishing a hierarchy of customer preference for objective and threshold values of key performance parameters enriches the development process of emerging systems by making the process simultaneously more effective and more efficient. |
Eli Golden U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT & ENGINEERING CENTER |
Contributed | materials | 2018 | ||||||||
Infrastructure Lifetimes (Abstract)
Infrastructure refers to the structures, utilities, and interconnected roadways that support the work carried out at a given facility. In the case of the Lawrence Livermore National Laboratory infrastructure is considered exclusive of scientific apparatus, safety and security systems. LLNL inherited it’s infrastructure management policy from the University of California which managed the site during LLNL’s first 5 decades. This policy is quite different from that used in commercial property management. Commercial practice weighs reliability over cost by replacing infrastructure at industry standard lifetimes. LLNL practice weighs overall lifecycle cost seeking to mitigate reliability issues through inspection. To formalize this risk management policy a careful statistical study was undertaken using 20 years of infrastructure replacement data. In this study care was taken to adjust for left truncation as-well-as right censoring. 57 distinct infrastructure class data sets were fitted using MLE to the Generalized Gamma distribution. This distribution is useful because it produces a weighted blending of discrete failure (Weibull model) and complex system failure (Lognormal model). These parametric fittings then yielded median lifetimes and conditional probabilities of failure. From conditional probabilities bounds on budget costs could be computed as expected values. This has provided a scientific basis for rational budget management as-well-as aided operations by prioritizing inspection, repair and replacement activities. |
Erika Taketa Lawrence Livermore National Laboratory |
Contributed | materials | 2018 | ||||||||
Infrastructure Lifetimes (Abstract)
Infrastructure refers to the structures, utilities, and interconnected roadways that support the work carried out at a given facility. In the case of the Lawrence Livermore National Laboratory infrastructure is considered exclusive of scientific apparatus, safety and security systems. LLNL inherited it’s infrastructure management policy from the University of California which managed the site during LLNL’s first 5 decades. This policy is quite different from that used in commercial property management. Commercial practice weighs reliability over cost by replacing infrastructure at industry standard lifetimes. LLNL practice weighs overall lifecycle cost seeking to mitigate reliability issues through inspection. To formalize this risk management policy a careful statistical study was undertaken using 20 years of infrastructure replacement data. In this study care was taken to adjust for left truncation as-well-as right censoring. 57 distinct infrastructure class data sets were fitted using MLE to the Generalized Gamma distribution. This distribution is useful because it produces a weighted blending of discrete failure (Weibull model) and complex system failure (Lognormal model). These parametric fittings then yielded median lifetimes and conditional probabilities of failure. From conditional probabilities bounds on budget costs could be computed as expected values. This has provided a scientific basis for rational budget management as-well-as aided operations by prioritizing inspection, repair and replacement activities. |
William Romine Lawrence Livermore National Laboratory |
Contributed | materials | 2018 | ||||||||
Workforce Analytics (Abstract)
Several statistical methods have been used effectively to model workforce behavior, specifically attrition due to retirement and voluntary separation[1]. Additionally various authors have introduced career development[2] as a meaningful aspect of workforce planning. While both general and more specific attrition modeling techniques yield useful results only limited success has followed attempts to quantify career stage transition probabilities. A complete workforce model would include quantifiable flows both vertically and horizontally in the network described pictorially here at a single time point in Figure 1. The horizontal labels in Figure 1 convey one possible meaning assignable to career stage transition – in this case, competency. More formal examples might include rank within a hierarchy such as in a military organization or grade in a civil service workforce. In the case of the Nuclear Weapons labs knowing that the specialized, classified knowledge needed to deal with Stockpile Stewardship is being preserved as evidenced by the production of Masters, individuals capable of independent technical work, is also of interest to governmental oversight. In this paper we examine the allocation of labor involved in a specific Life Extension program at LLNL. This growing workforce is described by discipline and career stage to determine how well the Norden-Rayleigh development cost model[3] fits the data. Since this model underlies much budget estimation within both DOD and NNSA the results should be of general interest. Data is also examined as a possible basis for quantifying horizontal flows in Figure 1. |
William Romine Lawrence Livermore National Laboratory |
Contributed | 2018 | |||||||||
Optimizing for Mission Success in Highly Uncertain Scenarios (Abstract)
Optimization under uncertainty increases the complexity of a problem as well as the computing resources required to solve it. As the amount of uncertainty is increased, these difficulties are exacerbated. However, when optimizing for mission-level objectives, rather than component- or system-level objectives, an increase in uncertainty is inevitable. Previous research has found methods to perform optimization under uncertainty, such as robust design optimization or reliability-based design optimization. These are generally executed at a product component quality level, to minimize variability and stay within design tolerances but are not tailored to capture the high amount of variability in a mission-level problem. . In this presentation, an approach for formulating and solving highly stochastic mission-level optimization problems is described. A case study is shown using an unmanned aerial system (UAS) on a search mission while an “enemy” UAS attempts to interfere. This simulation, modeled in the Unity Game Engine, has highly stochastic outputs, where the time to mission success varies by multiple orders of magnitude, but the ultimate goal is a binary output representing mission success or failure. The results demonstrate the capabilities and challenges of optimization in these types of mission scenarios. |
Brian Chell | Contributed | 2018 | |||||||||
Test Planning for Observational Studies using Poisson Process Modeling (Abstract)
Operational Test (OT) is occasionally conducted after a system is already fielded. Unlike a traditional test based on Design of Experiments (DOE) principles, it is often not possible to vary the levels of the factors of interest. Instead the test is of an observational nature. Test planning for observational studies involves choosing where, when, and how long to evaluate a system in order to observe the possible combinations of factor levels that define the battlespace. This presentation discusses a test-planning method that uses Poisson process modeling as a way to estimate the length of time required to observe factor level combinations in the operational environment. |
Brian Stone AFOTEC |
Contributed | materials | 2018 | ||||||||
Model credibility in statistical reliability analysis with limited data (Abstract)
Due to financial and production constraints, it has become increasingly common for analysts and test planners in defense applications to find themselves working with smaller amounts of data than seen in industry. These same analysts are also being asked to make strong statistical statements based on this limited data. For example, a common goal is ‘demonstrating’ a high reliability requirement with sparse data. In such situations, strong modeling assumptions are often used to achieve the desired precision. Such model-driven actions contain levels of risk that customers may not be aware of and may be too high to be considered acceptable. There is a need to articulate and mitigate risk associated with model form error in statistical reliability analysis. In this work, we review different views on model credibility from the statistical literature and discuss how these notions of credibility apply in data-limited settings. Specifically, we consider two different perspectives on model credibility: (1) data-driven credibility metrics for model fit, (2) credibility assessments based on consistency of analysis results with prior beliefs. We explain how these notions of credibility can be used to drive test planning and recommend an approach to presenting analysis results in data-limited settings. We apply this approach to two case studies from reliability analysis: Weibull analysis and Neyer-D optimal test plans. |
Caleb King Sandia National Laboratories |
Contributed | materials | 2018 | ||||||||
Sound level recommendations for quiet sonic boom dose-response community surveys (Abstract)
The current ban on commercial overland supersonic flight may be replaced by a noise limit on sonic boom sound level. NASA is establishing a quiet sonic boom database to guide the new regulation. The database will consist of multiple community surveys used to model the dose-response relationship between sonic boom sound levels and human annoyance. There are multiple candidate dose-response modeling techniques, such as classical logistic regression and multilevel modeling. To plan for these community surveys, recommendations for data collection will be developed from pilot community test data. Two important aspects are selecting sample size and sound level range. Selection of sample size must be strategic as large sample sizes are costly whereas small sample sizes may result in more uncertainty in the estimates. Likewise, there are trade-offs associated with selection of the sound level range. If the sound level range includes excessively high sound levels, the public may misunderstand the potential impact of quiet sonic booms, resulting in a negative backlash against a promising technological advancement. Conversely, a narrow range that includes only low sound levels might exclude the eventual noise limit. This presentation will focus on recommendations for sound level range given the expected shape of the dose-response curve. |
Jasme Lee North Carlina State Univeristy |
Contributed | materials | 2018 | ||||||||
Development of a Locking Setback Mass for Cluster Munition Applications: A UQ Case Study (Abstract)
The Army is currently developing a cluster munition that is required to meet functional reliability requirements of 99%. This effort focuses on the design process for a setback lock within the safe and arm (S&A) device in the submunition fuze. This lock holds the arming rotor in place, thus preventing the fuze from beginning its arming sequence until the setback lock detracts during a launch event. Therefore, the setback lock is required to not arm (remain in place) during a drop event (safety) and to arm during a launch event (reliability). In order to meet these requirements, uncertainty quantification techniques were used to evaluate setback lock designs. We designed a simulation experiment, simulated the setback lock behavior in a drop event and in a launch event, fit a model to the results, and optimized the design for safety and reliability. Currently, 8 candidate designs that meet the requirements are being manufactured, and adaptive sensitivity testing is planned to inform the surrogate models and improve their predictive capability. A final optimized design will be chosen based on the improved models, and realistic drop safety and arm reliability predictions will be obtained using Monte-Carlo simulations of the surrogate models. |
Melissa Jablonski U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT & ENGINEERING CENTER |
Contributed | materials | 2018 | ||||||||
Wednesday Keynote Speaker I |
Dr. Peter Parker Team Lead, Advanced Measurement Systems NASA (bio)
Dr. Parker is Team Lead for Advanced Measurement Systems at the National Aeronautics and Space Administration’s Langley Research Center in Hampton, Virginia. He serves an Agency-wide statistical expert across all of NASA’s mission directorates of Exploration, Aeronautics, and Science to infuse statistical thinking, engineering, and methods including statistical design of experiments, response surface methodology, and measurement system characterization. His expertise is in collaboratively integrating research objectives, measurement sciences, test design, and statistical methods to produce actionable knowledge for aerospace research and development. He holds a B.S. in Mechanical Engineering, a M.S. in Applied Physics and Computer Science and a M.S. and Ph.D. in Statistics from Virginia Tech. Dr. Parker is a senior member of the American Institute for Aeronautics and Astronautics, American Society for Quality, and the American Statistical Association. Dr. Parker currently Chairs the American Society for Quality’s Publication Management Board and previously served as Editor-in-Chief of the journal Quality Engineering. |
Keynote![]() |
![]() | 2019 | ||||||||
Tuesday Keynote |
Dr. David Chu President Institute for Defense Analyses (bio)
David Chu serves as President of the Institute for Defense Analyses. IDA is a non-profit corporation operating in the public interest. Its three federally funded research and development centers provide objective analyses of national security issues and related national challenges, particularly those requiring extraordinary scientific and technical expertise. As president, Dr. Chu directs the activities of more than 1,000 scientists and technologists. Together, they conduct and support research requested by federal agencies involved in advancing national security and advising on science and technology issues. Dr. Chu served in the Department of Defense as Under Secretary of Defense for Personnel and Readiness from 2001-2009, and earlier as Assistant Secretary of Defense and Director for Program Analysis and Evaluation from 1981-1993. From 1978-1981 he was the Assistant Director of the Congressional Budget Office for National Security and International Affairs. Dr. Chu served in the U. S. Army from 1968-1970. He was an economist with the RAND Corporation from 1970-1978, director of RAND’s Washington Office from 1994-1998, and vice president for its Army Research Division from 1998-2001. He earned a bachelor of arts in economics and mathematics, and his doctorate in economics, from Yale University. Dr. Chu is a member of the Defense Science Board and a Fellow of the National Academy of Public Administration. He is a recipient of the Department of Defense Medal for Distinguished Public Service with Gold Palm, the Department of Veterans Affairs Meritorious Service Award, the Department of the Army Distinguished Civilian Service Award, the Department of the Navy Distinguished Public Service Award, and the National Academy of Public Administration’s National Public Service Award. |
Keynote![]() |
2019 | |||||||||
Thursday Keynote Speaker II |
Mr. Michael Little Program Manager, Advanced Information Systems Technology Earth Science Technology Office, NASA Headquarters (bio)
Over the past 45 years, Mike’s primary focus has been on the management of research and development, focusing on making the results more useful in meeting the needs of the user community. Since 1984, he has specialized in communications, data and processing systems, including projects in NASA, the US Air Force, the FAA and the Census Bureau. Before that, he worked on Major System Acquisition Programs, in the Department of Defense including Marine Corps combat vehicles and US Navy submarines. Currently, Mike manages a comprehensive program to provide NASA’s Earth Science research efforts with the information technologies it will need in the 2020-2035 time-frame to characterize, model and understand the Earth. This Program addresses the full range of data lifecycle from generating data using instruments and models, through the management of the data and including the ways in which information technology can help to exploit the data. Of particular interest today are the ways in which NASA can measure and understand transient and transitional phenomena and the impact of climate change. The AIST Program focuses the application of applied math and statistics, artificial intelligence, case-based reasoning, machine learning and automation to improve our ability to use observational data and model output in understanding Earth’s physical processes and natural phenomena. Training and odd skills: Application of cloud computing US Government Computer Security US Navy Nuclear Propulsion operations and maintenance on two submarines |
Keynote![]() |
![]() | 2019 | ||||||||
Thursday Keynote Speaker I |
Dr. Wendy Martinez Director, Mathematical Statistics Research Center, Bureau of Labor Statistics ASA President-Elect (2020) (bio)
Wendy Martinez has been serving as the Director of the Mathematical Statistics Research Center at the Bureau of Labor Statistics (BLS) for six years. Prior to this, she served in several research positions throughout the Department of Defense. She held the position of Science and Technology Program Officer at the Office of Naval Research, where she established a research portfolio comprised of academia and industry performers developing data science products for the future Navy and Marine Corps. Her areas of interest include computational statistics, exploratory data analysis, and text data mining. She is the lead author of three books on MATLAB and statistics. Dr. Martinez was elected as a Fellow of the American Statistical Association (ASA) in 2006 and is an elected member of the International Statistical Institute. She was honored by the American Statistical Association when she received the ASA Founders Award at the JSM 2017 conference. Wendy is also proud and grateful to have been elected as the 2020 ASA President. |
Keynote![]() |
materials | 2019 | ||||||||
Wednesday Keynote Speaker II |
Dr. Laura Freeman Associate Director, ISL Hume Center for National Security and Technology, Virginia Tech (bio)
Dr. Laura Freeman is an Assistant Director of the Operational Evaluation Division at the Institute for Defense Analyses. In that position, she established and developed an interdisciplinary analytical team of statisticians, psychologists, and engineers to advance scientific approaches to DoD test and evaluation. Her focus areas include test design, statistical data analysis, modeling and simulation validation, human-system interactions, reliability analysis, software testing, and cybersecurity testing. Dr. Freeman currently leads a research task for the Chief Management Officer (CMO) aiming to reform DoD testing. She guides an interdisciplinary team in recommending changes and developing best practices. Reform initiatives include incorporating mission context early in the acquisition lifecycle, integrating all test activities, and improving data management processes. During 2018, Dr. Freeman served as that acting Senior Technical Advisor for Director Operational Test and Evaluation (DOT&E). As the Senior Technical Advisor, Dr. Freeman provided leadership, advice, and counsel to all personnel on technical aspects of testing military systems. She served as a liaison with Service technical advisors, General Officers, and members of the Senior Executive Service on key technical issues. She reviewed test strategies, plans, and reports from all systems on DOT&E oversight. During her tenure at IDA, Dr. Freeman has designed tests and conducted statistical analyses for programs of national importance including weapon systems, missile defense, undersea warfare systems, command and control systems, and most recently the F-35. She prioritizes supporting the analytical community in the DoD workforce. She developed and taught numerous courses on advanced test design and statistical analysis, including two new Defense Acquisition University (DAU) Courses on statistical methods. She is a founding organizer of DATAWorks (Defense and Aerospace Test and Analysis Workshop), a workshop designed to share new methods, provide training, and share best practices between NASA, the DoD, and National Labs. Dr. Freeman is the recipient of the 2017 IDA Goodpaster Award for Excellence in Research and the 2013 International Test and Evaluation Association (ITEA) Junior Achiever Award. She is a member of the American Statistical Association, the American Society for Quality, the International Statistical Engineering Association, and ITEA. She serves on the editorial boards for Quality Engineering, Quality Reliability Engineering International, and the ITEA Journal. Her areas of statistical expertise include designed experiments, reliability analysis, and industrial statistics. Prior to joining IDA in 2010, Dr. Freeman worked at SAIC providing statistical guidance to the Director, Operational Test and Evaluation. She also consulted with NASA on various projects. In 2008, Dr. Freeman established the Laboratory for Interdisciplinary Statistical Analyses at Virginia Tech and Served as its inaugural Director. Dr. Freeman has a B.S. in Aerospace Engineering, a M.S. in Statistics and a Ph.D. in Statistics, all from Virginia Tech. Her Ph.D. research was on design and analysis of experiments for reliability data. |
Keynote![]() |
![]() | 2019 | ||||||||
Thursday Lunchtime Keynote Speaker |
Dr. T. Charles Clancy Bradley Professor of Electrical and Computer Engineering Virginia Tech (bio)
Charles Clancy is the Bradley Professor of Electrical and Computer Engineering at Virginia Tech where he serves as the Executive Director of the Hume Center for National Security and Technology. Clancy leads a range of strategic programs at Virginia Tech related to security, including the Commonwealth Cyber Initiative. Prior to joining VT in 2010, Clancy was an engineering leader in the National Security Agency, leading research programs in digital communications and signal processing. He received his PhD from the University of Maryland, MS from University of Illinois, and BS from the Rose-Hulman Institute of Technology. He is co-author to over 200 peer-reviewed academic publications, six books, over twenty patents, and co-founder to five venture-backed startup companies. |
Keynote![]() |
![]() | 2019 | ||||||||
Wednesday Keynote Speaker III |
Mr. Timothy Dare Deputy Director, Developmental Test, Evaluation, and Prototyping SES OUSD(R&E) (bio)
Mr. Timothy S. Dare is the Deputy Director for Developmental Test, Evaluation and Prototyping (DD(DTEP)). As the DD(DTEP), he serves as the principal advisor on developmental test and evaluation (DT&E) to the Secretary of Defense, Under Secretary of Defense for Research and Engineering, and Director of Defense Research and Engineering for Advanced Capabilities. Mr. Dare is responsible for DT&E policy and guidance in support of the acquisition of major Department of Defense (DoD) systems, and providing advocacy, oversight, and guidance to the DT&E acquisition workforce. He informs policy and advances leading edge technologies through the development of advanced technology concepts, and developmental and operational prototypes. By working closely with interagency partners, academia, industry and governmental labs, he identifies, develops and demonstrates multi-domain technologies and concepts that address high-priority DoD, multi-Service, and Combatant Command warfighting needs. Prior to his appointment in December 2018, Mr. Dare was a Senior Program Manager for program management and capture at Lockheed Martin (LM) Space. In this role he was responsible for the capture and execution phases of multiple Intercontinental Ballistic Missile programs for Minuteman III, including a new airborne Nuclear Command and Control (NC2) development program. His major responsibilities included establishing program working environments at multiple locations, policies, processes, staffing, budget and technical baselines. Mr. Dare has extensive T&E and prototyping experience. As the Engineering Program Manager for the $1.8B Integrated Space C2 programs for NORAD/NORTHCOM systems at Cheyenne Mountain, Mr. Dare was the Integration and Test lead focusing on planning, executing, and evaluating the integration and test phases (developmental and operational T&E) for Missile Warning and Space Situational Awareness (SSA) systems. Mr. Dare has also been the Engineering Lead/Integration and Test lead on other systems such as the Hubble Space Telescope; international border control systems; artificial intelligence (AI) development systems (knowledge-based reasoning); Service-based networking systems for the UK Ministry of Defence; Army C2 systems; Space Fence C2; and foreign intelligence, surveillance, and reconnaissance systems. As part of the Department’s strategic defense portfolio, Mr. Dare led the development of advanced prototypes in SSA C2 (Space Fence), Information Assurance (Single Sign-on), AI systems, and was the sponsoring program manager for NC2 capability development. Mr. Dare is a graduate of Purdue University and is a member of both the Association for Computing Machinery and Program Management Institute. He has been recognized by the U.S. Air Force for his contributions supporting NORAD/NORTHCOM’s strategic defense missions, and the National Aeronautics and Space Administration for his contributions to the original Hubble Space Telescope program. Mr. Dare holds a U.S. Patent for Single Sign-on architectures. |
Keynote![]() |
![]() | 2019 | ||||||||
Wednesday Lunchtime Keynote Speaker |
Dr. Jared Freeman Chief Scientist of Aptima and Chair of the Human Systems Division National Defense Industry Association (bio)
Jared Freeman, Ph.D., is Chief Scientist of Aptima and Chair of the Human Systems Division of the National Defense Industry Association. His research and publications address measurement, assessment, and enhancement of human learning, cognition, and performance in technologically complex military environments. |
Keynote![]() |
![]() | 2019 | ||||||||
Welcoming & Opening Keynote-Tuesday AM |
Dr. Mike Gilmore Director DOT&E (bio)
Link to Bio unavail |
Keynote![]() |
materials | 2016 | ||||||||
Lunch with Keynote Leadership Perspective |
Dr. Jon Hollday NASA (bio)
Link to Bio unavail |
Keynote![]() |
2016 | |||||||||
STAT Engineering Keynote-Wednesday AM |
Dr. Christine Anderson Cook Statistics Los Alamos National Lab (bio)
s in the areas of complex system reliability, non-proliferation, malware detection and statistical process control. Before joining LANL, she was a faculty member in the Department of Statistics at Virginia Tech for 8 years. Her research areas include response surface methodology, design of experiments, reliability, multiple criterion optimization and graphical methods. She has authored more than 130 articles in statistics and quality peerreviewed journals, and has been a long time contributor to the Quality Progress Statistics Roundtable column. In 2012, she edited a special issue in Quality Engineering on Statistical Engineering with Lu Lu. She is an elected fellow of the American Statistical Association and the American Society for Quality. In 2012 she was honored with the ASQ Statistics Division William G. Hunter Award. In 2011 she received the 26th Annual Governor’s Award for Outsta |
Keynote![]() |
2016 | |||||||||
Lunch With Leadership Perspective-Wednesday PM |
Dr. David Brown Deputy Assistant Secretary of Defense, Developmental Test & Evaluation (bio)
Link to Bio unavail |
Keynote![]() |
2016 | |||||||||
Test and Evaluation Matter for Warfighter and We Need your Help to Make it Better |
Dr. Katherine Warner Science Advisor within Secretary of Defense DOT&E (bio)
Dr. Catherine Warner serves as the Science Advisor for the Director, Operational Test and Evaluation within the Office of the Secretary of Defense. Dr. Warner has been involved with operational test and evaluation since 1991, when she became a research staff member at the Institute for Defense Analyses (IDA). In that capacity, Dr. Warner performed and directed analysis of operational tests for Army, Navy, and Air Force systems in support of DOT&E. From 2005 – 2010, Dr. Warner was an Assistant Director at IDA and also served as the lead for the Air Warfare group. Her analysis portfolio included major aircraft systems such as the F-22, F/A-18E/F, V-22, and H-1. Prior to that, Dr. Warner was the lead analyst for Unmanned Aerial Vehicle (UAV) systems including Predator, Shadow, Hunter, and Global Hawk. In 2013 at the request of the Defense Information Systems Agency (DISA), Dr. Warner deployed to Kabul, Afghanistan for 16 months in support of NATO’s International Security Assistance Force (ISAF) and the US Operation Enduring Freedom (OEF). She lead a team of Information Technology specialists in advising Afghanistan’s Ministry of Communications and Information Technology on enhancing national communication capabilities for the security and economic growth of the country. The primary focus of this team included supporting the completion of Afghanistan’s National Fiber Optic Ring, Spectrum Management, and Cyber Security.Dr. Warner previously worked at the Lawrence Livermore National Laboratory. She grew up in Albuquerque, New Mexico, attended the University of New Mexico and San Jose State University where she earned both B.S. and M.S. degrees in Chemistry. She also earned both M.A. and Ph.D. degrees in Chemistry from Princeton University. |
Keynote![]() |
materials | 2017 | ||||||||
Retooling Design and Development |
Mr. Chris Singer NASA Deputy Chief Engineer NASA (bio)
Christopher (Chris) E. Singer is the NASA Deputy, Chief Engineer responsible for integrating engineering across the Agencies 10 field centers. Prior to this appointment in April 2016, he served as the Engineering Director at NASA’s Marshall Space Flight Center in Huntsville, Alabama. Appointed in 2011, Mr. Singer led an organization of 1,400 civil service and 1,200 support contractor employees responsible for the design, testing, evaluation, and operation of hardware and software associated with space transportation, spacecraft systems, science instruments and payloads under development at the Marshall Center. The Engineering Directorate also manages NASA’s Payload Operations Center at Marshall, which is the command post for scientific research activities on-board the International Space Station. Mr. Singer began his NASA career in 1983 as a rocket engine specialist. In 1992, he served a one-year assignment at NASA Headquarters in Washington, DC, as senior manager for the space shuttle main engine and external tank in the Space Shuttle Support Office. In 1994, Mr. Singer supervised the development and implementation of safety improvements and upgrades to shuttle propulsion components. In 2000, he was appointed chief engineer in the Space Transportation Directorate then was selected as deputy director of Marshall’s Engineering Directorate from 2004 to 2011. Mr. Singer is an AIAA Associate Fellow. In 2006, he received the Presidential Rank Award for Meritorious Executives — the highest honor for career federal employees. He was awarded the NASA Outstanding Leadership Medal in 2001 and 2008 for his leadership. In 1989, he received the prestigious Silver Snoopy Award from the Astronaut Corps for his contributions to the success of human spaceflight missions. A native of Nashville, Tennessee, Mr. Singer earned a bachelor’s degree in mechanical engineering in 1983 from Christian Brothers University in Memphis, Tennessee. Chris enjoys woodworking, fishing and Hang gliding. Chris is married to the former Jody Adams of Hartselle, Alabama. They have three children and live in Huntsville, Alabama. |
Keynote![]() |
materials | 2017 | ||||||||
Reflections on Statistical Engineering and it’s Application |
Dr. Geoff Vining Professor Virginia Tech (bio)
Geoff Vining is a Professor of Statistics at Virginia Tech. From 1999 – 2006, he also was the department head. He currently is the ASQ Treasurer for 2016 and Past-Chair of the ASQ Technical Communities Council. He is a Fellow of the ASQ, a Fellow of the American Statistical Association (ASA), and an Elected Member of the International Statistical Institute. Dr. Vining served as Editor of the Journal of Quality Technology from 1998 – 2000 and as Editor-in-Chief of Quality Engineering from 2008-2009. He also has served as Chair of the ASQ Publications Management Board, as Chair of the ASQ Statistics Division, and as Chair of the ASA Quality and Productivity Section. Dr. Vining won the 2010 Shewhart Medal, the ASQ career award given annually to the person not previously so honored who has demonstrated the most outstanding technical leadership in the field of modern quality control, especially through the development to its theory, principles, and techniques. He also received the 2015 Box Medal from the European Network for Business and Industrial Statistics (ENBIS). This medal recognizes each year an extraordinary statistician who has remarkably contributed with his/her work to the development and the application of statistical methods in European business and industry. In 2013, he received an Engineering Excellence Award from the NASA Engineering and Safety Center. He received the 2011 William G. Hunter Award from the ASQ Statistics Division for excellence in statistics as a communicator, a consultant, an educator, an innovator, an integrator of statistics with other disciplines and an implementer who obtains meaningful results. He won the 1990 ASQ Brumbaugh Award for the paper published in an ASQ journal that made the greatest contribution to the development of industrial applications of quality control and the 2005 Lloyd Nelson Award from the Statistics Division for the paper published in the Journal of Quality Technology that had the greatest immediate impact to practitioners. Dr. Vining is the author of three textbooks. He is an internationally recognized expert in the use of experimental design for quality, productivity, and reliability improvement and in the application of statistical process control. He has extensive consulting experience, most recently with NASA and the U.S. Department of Defense. |
Keynote![]() |
materials | 2017 | ||||||||
Developmental Test and Evaluation |
Dr. Brian Hall Principal Deputy Director DoD (bio)
Dr. Hall was appointed to the Senior Executive Service in November 2014 as the Principal Deputy Director for Developmental Test and Evaluation, and is currently serving as the Principle Deputy Director of the Test Resource Management Center (TRMC). In this position, he oversees matters concerning the Nation’s critical test range infrastructure, science and technology efforts, development of the biennial Strategic Plan for DoD Test and Evaluation (T&E) resources, as well as certification of the Services’ T&E budgets. Prior to this position, Dr. Hall was the Technical Advisor for Operational Test and Evaluation (OT&E) of all Land and Expeditionary Warfare systems in OSD-DOT&E. In this position, advised the highest authorities in DoD for OT&E, observed operational testing, and coauthored Operational Assessments and Beyond Low-Rate Initial Production reports submitted to the 4 Congressional Defense Committees. Prior to serving on the OSD staff, Dr. Hall was the Division Chief for Aviation, Missiles, and C4ISR Systems in the Army Test and Evaluation Command (ATEC), where he was responsible for supervising the development of test plans and evaluations that directly supported milestone decision reviews and materiel fielding/production decisions of more than 300 Army programs. While with the Army, Dr. Hall was one of the leading reliability experts that helped establish the Center for Reliability Growth at Aberdeen Proving Ground, as well as develop and administer the ATEC/AMSAA 3-day reliability course to improve defense acquisition learning. Over his career, Dr. Hall has led studies, developed methodologies, presented research, published papers, crafted policy, and authored policy implementation guides. He has also developed staff, advised numerous defense programs, and served as an executive member of, or invited contributor to: tri-service DoD Blue Ribbon Panels; National Academy of Science studies; and DoD working-groups to improve system reliability. Dr. Hall is a Senior Service College graduate and has earned advanced degrees in Applied Mathematics, Reliability Engineering, and Strategic Studies. He is a domain expert in Reliability Engineering, as well as in Reliability Growth Management and Methodology. He has developed statistical methods and reliability growth models that have been: published in international journals, incorporated into Military Handbooks, adopted by Operational Test Agency policy, and utilized to shape growth plans and assess reliability maturity of numerous DoD systems. |
Keynote![]() |
materials | 2017 | ||||||||
Mr. Dave Duma Acting Director, Operationak Test and Evaluation, Office of the Secretary of Defense DOT&E (bio)
Mr. Duma is the Acting Director, Operational Test and Evaluation as of January 20, 2017. Mr. Duma was appointed as the Principal Deputy Director, Operational Test and Evaluation in January 2002. In this capacity he is responsible for all functional areas assigned to the office. He participates in the formulation, development, advocacy, and oversight of policies of the Secretary of Defense and in the development and implementation of test and test resource programs. He oversees the planning, conduct, analysis, evaluation, and reporting of operational and live fire testing. He serves as the Appropriation Director and Comptroller for the Operational Test and Evaluation, Defense Appropriation and coordinates all Planning, Programming, and Budgeting Execution matters. He previously served as Acting Director, Operational Test and Evaluation from February 2005 to July 2007 and again from May 2009 to September 2009. Mr. Duma also served as the Acting Deputy Director, Operational Test and Evaluation from January 1992 to June 1994. In this capacity he was responsible for oversight of the planning, conduct, analysis, and reporting of operational test and evaluation for all major conventional weapons systems in the Department of Defense. He supervised the development of evaluation plans and test program strategies, observed the conduct of operational test events, evaluated operational field tests of all armed services and submitted final reports for Congress. Mr. Duma returned to government service from the commercial sector. In private industry he worked a variety of projects involving test and evaluation; requirements generation; command, control, communications, intelligence, surveillance and reconnaissance; modeling and simulation; and software development. Mr. Duma has 30 years of naval experience during which he was designated as a Joint Service Officer. He served as the Director, Test and Evaluation Warfare Systems for the Chief of Naval Operations, the Deputy Commander, Submarine Squadron TEN, and he commanded the nuclear powered submarine USS SCAMP (SSN 588). Mr. Duma holds Masters of Science degrees in National Security and Strategic Studies and in Management. He holds a Bachelor of Science degree in Nuclear Engineering. He received the U.S. Presidential Executive Rank Award on two occasions; in 2008, the Meritorious Executive Award and in 2015, the Distinguished Executive Rank Award. He is a member of the International Test and Evaluation Association. |
Keynote![]() |
2017 | ||||||||||
Opening Keynote |
Dr. David Chu President IDA (bio)
David Chu serves as President of the Institute for Defense Analyses. IDA is a non-profit corporation operating in the public interest. Its three federally funded research and development centers provide objective analyses of national security issues and related national challenges, particularly those requiring extraordinary scientific and technical expertise. As president, Dr. Chu directs the activities of more than 1,000 scientists and technologists. Together, they conduct and support research requested by federal agencies involved in advancing national security and advising on science and technology issues. Dr. Chu served in the Department of Defense as Under Secretary of Defense for Personnel and Readiness from 2001-2009, and earlier as Assistant Secretary of Defense and Director for Program Analysis and Evaluation from 1981-1993. From 1978-1981 he was the Assistant Director of the Congressional Budget Office for National Security and International Affairs. Dr. Chu served in the U. S. Army from 1968-1970. He was an economist with the RAND Corporation from 1970-1978, director of RAND’s Washington Office from 1994-1998, and vice president for its Army Research Division from 1998-2001. He earned a bachelor of arts in economics and mathematics, and his doctorate in economics, from Yale University. Dr. Chu is a member of the Defense Science Board and a Fellow of the National Academy of Public Administration. He is a recipient of the Department of Defense Medal for Distinguished Public Service with Gold Palm, the Department of Veterans Affairs Meritorious Service Award, the Department of the Army Distinguished Civilian Service Award, the Department of the Navy Distinguished Public Service Award, and the National Academy of Public Administration’s National Public Service Award. |
Keynote![]() |
![]() | 2018 | ||||||||
Closing Remarks |
Mr. Robert Behler Director DOT&E (bio)
Robert F. Behler was sworn in as Director of Operational Test and Evaluation on December 11, 2017. A Presidential appointee confirmed by the United States Senate, he serves as the senior advisor to the Secretary of Defense on operational and live fire test and evaluation of Department of Defense weapon systems. Prior to his appointment, he was the Chief Operating Officer and Deputy Director of the Carnegie Mellon University Software Engineering Institute (SEI), a Federally Funded Research and Development Center. SEI is a global leader in advancing software development and cybersecurity to solve the nation’s toughest problems through focused research, development, and transition to the broader software engineering community. Before joining the SEI, Mr. Behler was the President and CEO of SRC, Inc. (formerly the Syracuse Research Corporation). SRC is a not-for-profit research and development corporation with a forprofit manufacturing subsidiary that focuses on radar, electronic warfare and cybersecurity technologies. Prior to working at SRC, Mr. Behler was the General Manager and Senior Vice President of the MITRE Corp where he provided leadership to more than 2,500 technical staff in 65 worldwide locations. He joined MITRE from the Johns Hopkins University Applied Physics Laboratory where he was a General Manager for more than 350 scientists and engineers as they made significant contributions to critical Department of Defense (DOD) precision engagement challenges. General Behler served 31 years in the United States Air Force, retiring as a Major General in 2003. During his military career, he was the Principal Adviser for Command and Control, Intelligence, Surveillance and Reconnaissance (C21SR) to the Secretary and Chief of Staff of the U.S. Air Force (USAF). International assignments as a general officer included the Deputy Commander for NATO’s Joint Headquarters North in Stavanger, Norway. He was the Director of the Senate Liaison Office for the USAF during the 104th congress. Mr. Behler also served as the assistant for strategic systems to the Director of Operational Test and Evaluation. As an experimental test pilot, he flew more than 65 aircraft types. Operationally he flew worldwide reconnaissance missions in the fastest aircraft in the world, the SR-71 Blackbird. Mr. Behler is a Fellow of the Society of Experimental Test Pilots and an Associate Fellow of the American Institute of Aeronautics and Astronautics. He is a graduate of the University of Oklahoma where he received a B.S. and M.S. in aerospace engineering, has a MBA from Marymount University and was a National Security Fellow at the JFK School of Government at Harvard University. Mr. Behler has recently been on several National Research Council studies for the National Academy of Sciences including: “Critical Code,” “Software Producibility, Achieving Effective Acquisition of Information Technology in the Department of Defense” and “Development Planning: A Strategic Approach to Future Air Force Capabilities.” |
Keynote![]() |
2018 | |||||||||
Testing and Analytical Challenges on the Path to Hypersonic Flight |
Dr. Mark Lewis Director IDA-STPI (bio)
Dr. Mark J. Lewis is the Director of IDA’s Science and Technology Policy Institute, a federally funded research and development center. He leads an organization that provides analysis of national and international science and technology issues to the Office of Science and Technology Policy in the White House, as well as other Federal agencies including the National Institutes of Health, the National Science Foundation, NASA, the Department of Energy, Homeland Security, and the Federal Aviation Administration, among others.Prior to taking charge of STPI, Dr. Lewis served as the Willis Young, Jr. Professor and Chair of the Department of Aerospace Engineering at the University of Maryland. A faculty member at Maryland for 24 years, Dr. Lewis taught and conducted basic and applied research. From 2004 to 2008, Dr. Lewis was the Chief Scientist of the U.S. Air Force. From 2010 to 2011, he was President of the American Institute of Aeronautics and Astronautics (AIAA).Dr. Lewis attended the Massachusetts Institute of Technology, where he received a Bachelor of Science degree in aeronautics and astronautics, Bachelor of Science degree in earth and planetary science (1984), Master of Science (1985), and Doctor of Science (1988) in aeronautics and astronautics.Dr. Lewis is the author of more than 300 technical publications and has been an adviser to more than 70 graduate students. Dr. Lewis has also served on various advisory boards for NASA, the Air Force, and DoD, including two terms on the Air Force Scientific Advisory Board, the NASA Advisory Council, and the Aeronautics and Space Engineering Board of the National Academies.Dr. Lewis’s awards include the Meritorious Civilian Service Award and Exceptional Civilian Service Award; he was also recognized as the 1994 AIAA National Capital Young Scientist/Engineer of the Year, IECEC/ AIAA Lifetime Achievement Award, and is an Aviation Week and Space Technology Laureate (2007). |
Keynote![]() |
![]() | 2018 | ||||||||
Journey to a Data Centric Approcach for National Security |
Dr. Marcey Hoover Quality Assurance Director Sandia National Labortories (bio)
As Quality Assurance Director, Dr. Marcey Hoover is responsible for designing and sustaining the Laboratories’ quality assurance system and the associated technical capabilities needed for flawless execution of safe, secure, and efficient work to deliver exceptional products and services to its customers.Marcey previously served as the Senior Manager responsible for developing the science and engineering underpinning efforts to predict and influence the behavior of complex, highly interacting systems critical to our nation’s security posture. In her role as Senior Manager and Chief of Operations for Sandia’s Energy and Climate program, Marcey was responsible for strategic planning, financial management, business development, and communications. In prior positions, she managed organizations responsible for (1) quality engineering on new product development programs, (2) research and development of advanced computational techniques in the engineering sciences, and (3) development and execution of nuclear weapon testing and evaluation programs. Marcey has also led several executive office functions, including corporate- level strategic planning.Active in both the American Statistical Association and the American Society for Quality (ASQ), Marcey served two terms as the elected ASQ Statistics Division Treasurer. She was recognized as the Outstanding Alumni of the Purdue University Statistics Department in 2009 and nominated in 2011 for the YWCA Middle Rio Grande Women on the Move award. She currently serves on both the Purdue Strategic Research Advisory Council and the Statistics Alumni Advisory Board, and as a mentor for Big Brothers Big Sisters of New Mexico.Marcey received her bachelor of science degree in mathematics from Michigan State University, and her master of science and doctor of philosophy degrees in mathematical statistics from Purdue University. |
Keynote![]() |
![]() | 2018 | ||||||||
NASA AERONAUTICS |
Mr. Bob Pearce Deputy Associate Administrator Strategy, Aeronautics Research Mission DirectorateNASA (bio)
Mr. Pearce is responsible for leading aeronautics research mission strategic planning to guide the conduct of the agency’s aeronautics research and technology programs, as well as leading ARMD portfolio planning and assessments, mission directorate budget development and approval processes, and review and evaluation of all of NASA’s aeronautics research mission programs for strategic progress and relevance. Pearce is also currently acting director for ARMD’s Airspace Operations and Safety Program, and responsible for the overall planning, management and evaluation of foundational air traffic management and operational safety research. Previously he was director for strategy, architecture and analysis for ARMD, responsible for establishing a strategic systems analysis capability focused on understanding the system-level impacts of NASA’s programs, the potential for integrated solutions, and the development of high-leverage options for new investment and partnership. From 2003 until July 2010, Pearce was the deputy director of the FAA-led Next Generation Air Transportation System (NextGen) Joint Planning and Development Office (JPDO). The JPDO was an interagency office tasked with developing and facilitating the implementation of a national plan to transform the air transportation system to meet the long-term transportation needs of the nation. Prior to the JPDO, Pearce held various strategic and program management positions within NASA. In the mid-1990s he led the development of key national policy documents including the National Science and Technology Council’s “Goals for a National Partnership in Aeronautics Research and Technology” and the “Transportation Science and Technology Strategy.” These two documents provided a substantial basis for NASA’s expanded investment in aviation safety and airspace systems. He began his career as a design engineer at the Grumman Corporation, working on such projects as the Navy’s F-14 Tomcat fighter and DARPA’s X-29 Forward Swept Wing Demonstrator. Pearce also has experience from the Department of Transportation’s Volpe National Transportation Systems Center where he made contributions in the area of advanced concepts for intercity transportation systems. Pearce has received NASA’s Exceptional Service Medal for sustained excellence in planning and advocating innovative aeronautics programs in conjunction with the White House and other federal agencies. He received NASA’s Exceptional Achievement Medal for outstanding leadership of the JPDO in support of the transformation of the nation’s air transportation system. Pearce has also received NASA’s Cooperative External Achievement Award and several Exceptional Performance and Group Achievement Awards.He earned a bachelor’s of science degree in mechanical and aerospace engineering from Syracuse University, and a master’s of science degree in technology and policy from the Massachusetts Institute of Technology. |
Keynote![]() |
materials | 2018 | ||||||||
High-Effective Statistical Collaboration. The Art and the Science |
Dr. Peter Parker Team Lead Advance Measurement Systems NASA (bio)
Dr. Parker is Team Lead for Advanced Measurement Systemsat the National Aeronautics and Space Administration’s Langley Research Center in Hampton, Virginia. He serves an Agency-wide statistical expert across all of NASA’s mission directorates of Exploration, Aeronautics, and Science to infuse statistical thinking, engineering, and methods including statistical design of experiments, response surface methodology, and measurement system characterization. His expertise is in collaboratively integrating research objectives, measurement sciences, test design, and statistical methods to produce actionable knowledge for aerospace research and development. He holds a B.S. in Mechanical Engineering, a M.S. in Applied Physics and Computer Science and a M.S. and Ph.D. in Statistics from Virginia Tech. Dr. Parker is a senior member of the American Institute for Aeronautics and Astronautics, American Society for Quality, and the American Statistical Association. Dr. Parker currently Chairs the American Society for Quality’s Publication Management Board and previously served as Editor-in-Chief of the journal Quality Engineering. |
Keynote![]() |
![]() | 2018 | ||||||||
Consensus Building |
Dr. Antonio Possolo NIST Fellow, Chief Statistician National Institute of Standards and Technology. (bio)
Antonio Possolo holds a Ph.D. in statistics from Yale University, and has been practicing the statistical arts for more than 35 years, in industry (General Electric, Boeing), academia (Princeton University, University of Washington in Seattle, Classical University of Lisboa), and government. He is committed to the development and application of probabilistic and statistical methods that contribute to advances in science and technology, and in particular to measurement science. |
Keynote![]() |
materials | 2018 | ||||||||
Opening Keynote |
Mr. Dave Duma Assistant Director Operational Test and Evaluation (bio)
Mr. Duma is the Acting Director, Operational Test and Evaluation as of January 20, 2017. Mr. Duma was appointed as the Principal Deputy Director, Operational Test and Evaluation in January 2002. In this capacity he is responsible for all functional areas assigned to the office. He participates in the formulation, development, advocacy, and oversight of policies of the Secretary of Defense and in the development and implementation of test and test resource programs. He oversees the planning, conduct, analysis, evaluation, and reporting of operational and live fire testing. He serves as the Appropriation Director and Comptroller for the Operational Test and Evaluation, Defense Appropriation and coordinates all Planning, Programming, and Budgeting Execution matters. He previously served as Acting Director, Operational Test and Evaluation from February 2005 to July 2007 and again from May 2009 to September 2009.Mr. Duma also served as the Acting Deputy Director, Operational Test and Evaluation from January 1992 to June 1994. In this capacity he was responsible for oversight of the planning, conduct, analysis, and reporting of operational test and evaluation for all major conventional weapons systems in the Department of Defense. He supervised the development of evaluation plans and test program strategies, observed the conduct of operational test events, evaluated operational field tests of all armed services and submitted final reports for Congress.Mr. Duma returned to government service from the commercial sector. In private industry he worked a variety of projects involving test and evaluation; requirements generation; command, control, communications, intelligence, surveillance and reconnaissance; modeling and simulation; and software development.Mr. Duma has 30 years of naval experience during which he was designated as a Joint Service Officer. He served as the Director, Test and Evaluation Warfare Systems for the Chief of Naval Operations, the Deputy Commander, Submarine Squadron TEN, and he commanded the nuclear powered submarine USS SCAMP (SSN 588).Mr. Duma holds Masters of Science degrees in National Security and Strategic Studies and in Management. He holds a Bachelor of Science degree in Nuclear Engineering. He received the U.S. Presidential Executive Rank Award on two occasions; in 2008, the Meritorious Executive Award and in 2015, the Distinguished Executive Rank Award. He is a member of the International Test and Evaluation Association. |
Keynote![]() |
![]() | 2018 | ||||||||
Morning Keynote |
Greg Zacharias Chief Scientist DOT&E (bio)
Dr. Greg Zacharias serves as Chief Scientist to the Director of Operational Test and Evaluation, providing scientific and technical (S&T) guidance on the overall approach to assessing the operational effectiveness, suitability, and survivability of major DOD weapon systems. He advises the DOT&E in critical S&T areas including: emerging technologies; modeling and simulation (M&S); human-systems integration; and test design/analysis. Dr. Zacharias also represents the DOT&E on technical groups focused on policy, programs, and technology assessments, interacting with the DOD, industry, and academia. Before this appointment, Dr. Zacharias was the Chief Scientist of the US Air Force (USAF), advising the Secretary and the Chief of Staff, providing assessments on a range of S&T issues affecting the Air Force mission, and interacting with other Air Staff principals, acquisition organizations, and S&T communities. He served on the Executive Committee of the Air Force Scientific Advisory Board (SAB), and was the principal USAF S&T representative to the civilian scientific/engineering community and the public. His office published an autonomous systems roadmap entitled “Autonomous Horizons: The Way Forward.” Earlier, Dr. Zacharias served as President and Senior Principal Scientist of Charles River Analytics, providing strategic direction for the Government Services and Commercial Solutions Divisions. Before co-founding Charles River, he was a Senior Scientist at Raytheon/BBN, where he developed and applied models of human decision-making in multi-agent dynamic environments. Earlier, as a Research Engineer at the CS Draper Laboratory, Dr. Zacharias focused on advanced human/machine interface design issues for the Space Shuttle, building on an earlier USAF assignment at NASA, where he was responsible for preliminary design definition of the Shuttle reentry flight control system. Dr. Zacharias served on the Air Force SAB for eight years, contributing to nine summer studies, including chairing a study on “Future Operations Concepts for Unmanned Aircraft Systems.” As a SAB member he also chaired the Human System Wing Advisory Group, was a member of Air Combat Command’s Advisory Group, and served as a technical program reviewer for the Air Force Research Laboratory. He was a member of the National Research Council (NRC) Committee on Human-Systems Integration for over ten years, supporting several NRC studies including a DMSO-sponsored study of military human behavior models, and co-chairing a follow-up USAF-sponsored study to identify promising DOD S&T investments in the area. He has served on the DOD Human Systems Technology Area Review and Assessment (TARA) Panel, Embry-Riddle’s Research Advisory Board, MIT’s Engineering Systems Division Advisory Board, the Board of the Small Business Technology Council (SBTC), and was the founding Chair of the Human Systems Division of the National Defense Industrial Association (NDIA). Dr. Zacharias obtained his BS, MS, and PhD degrees in Aeronautics and Astronautics at MIT, where he was an MIT Sloan Scholar. He is a Distinguished Graduate and Distinguished Alumnus of USAF Officer Training School (OTS), and has received the USAF Exceptional Civilian Service Award, and twice received the USAF Meritorious Civilian Service Award. |
Keynote![]() |
2020 | |||||||||
Lunch Keynote |
Yisroel Brumer Principal Deputy Director CAPE (bio)
Dr. Yisroel Brumer is the Principal Deputy Director of Cost Assessment and Program Evaluation (CAPE) in the Office of the Secretary of Defense. In this role, he oversees all CAPE analysis and activities including strategic studies, programmatic analysis, and cost estimates across the entire Department of Defense. In particular, he leads CAPE’s involvement in the annual program and budget review process, providing oversight and decision support for content and funding for the entire DoD five-year fiscal year defense program. Within the acquisitions process, he leads CAPE’s oversight of all major investment programs across the DoD, with particular emphasis on the analysis of alternative investment strategies, cost estimation, cost-benefit analyses, economic analyses, and programmatic tradeoffs. He also guides the Strategic Portfolio Reviews – cross-cutting studies requested by the Secretary of Defense on high-interest issues critical to the success of the Department. Finally, he oversees Independent Cost Estimates and cost analysis, ensuring that such processes provide accurate information and realistic estimates of acquisition program cost to senior Department leaders and Congressional defense committees. From 2017 to 2018, Dr. Brumer was CAPE’s Deputy Director for Analysis and Innovation, where he was responsible for executing major cross-cutting analyses across the entire Defense portfolio. Beginning in 2012, he led CAPE’s Strategic, Defensive, and Space Programs Division, providing advice and analysis on over $60B per year of programs ranging from antiterrorism to intercontinental ballistic missiles. In that role, he was hand-picked to oversee the Secretary of Defense’s number one priority, a multibillion dollar revamp of the entire nuclear enterprise, including a cultural overhaul and the initial stages of the Nuclear Triad modernization. From 2010 to 2012, he served as Director of CAPE’s Program Analysis Division, where he led major cross-cutting analyses and all DoD Front End Assessments, all selected by and briefed directly to the Secretary of Defense. Dr. Brumer first joined CAPE in 2005 as an Operations Research Analyst, where he conducted analysis and provided advice to senior leaders on analytical tradeoffs in the DoD’s science and technology, homeland defense, nuclear command and control, and combating weapons of mass destruction portfolios. Dr. Brumer holds a Ph.D. in Chemical Physics and a Master of Science in Chemistry from Harvard University, as well as a Bachelor of Science in Chemistry from the University of Toronto. After conducting postdoctoral research at Harvard on the physics of complex biological systems, he joined the Department of Homeland Security’s Science and Technology Directorate as a fellow with the American Association for the Advancement of Science (AAAS), where he was a pioneering member of a number of key programs. Dr. Brumer has received the Presidential Rank Award of Meritorious Executive, the Secretary of Defense Medal for Meritorious Civilian Service (with Bronze Palm), the Secretary of Defense Award for Excellence, the Space and Command, Control, Communications, Computers, Intelligence, Surveillance, and Reconnaissance High Impact Analysis Award, and the Daniel Wilson Scholarship in Chemistry, as well as other awards. When not in the Pentagon, Dr. Brumer enjoys spending time with his fantastic wife Kim and their two excellent children, Eliana and Netanel. |
Keynote![]() |
2020 | |||||||||
Morning Keynote |
Michael Seablom Chief Technologist Science Mission Directorate (bio)
Michael Seablom is the Chief Technologist for the Science Mission Directorate at NASA Headquarters. He has the responsibility for surveying and assessing technology needs for the Heliophysics, Astrophysics, Earth Science, and Planetary Science Divisions, and is the primary liaison to the NASA Office of Chief Technologist and the Space Technology Mission Directorate. |
Keynote![]() |
2020 | |||||||||
Opening Keynote |
Norton “Norty” Schwartz President and CEO IDA (bio)
General Norton A. Schwartz serves as President and CEO of the Institute for Defense Analyses (IDA), a nonprofit corporation operating in the public interest. IDA manages three Federally Funded Research and Development Centers that answer the most challenging U.S. security and science policy questions with objective analysis leveraging extraordinary scientific, technical, and analytic expertise. At IDA, General Schwartz (U.S. Air Force, retired) directs the activities of more than 1,000 scientists and technologists employed by IDA. General Schwartz has a long and prestigious career of service and leadership that spans over five decades. He was most recently President and CEO of Business Executives for National Security (BENS). During his 6-year tenure at BENS, he was also a member of IDA’s Board of Trustees. Prior to retiring from the U.S. Air Force, General Schwartz served as the 19th Chief of Staff of the U.S. Air Force from 2008 to 2012. He previously held senior joint positions as Director of the Joint Staff and as the Commander of the U.S. Transportation Command. He began his service as a pilot with the airlift evacuation out of Vietnam in 1975. General Schwartz is a U.S. Air Force Academy graduate and holds a master’s degree in business administration from Central Michigan University. He is also an alumnus of the Armed Forces Staff College and the National War College. He is a member of the Council on Foreign Relations and a 1994 Fellow of Massachusetts Institute of Technology’s Seminar XXI. General Schwartz has been married to Suzie since 1981. |
Keynote![]() |
2020 | |||||||||
Modeling Human-System Interaction in UAM: Design and Application of Human-Autonomy Teaming (Abstract)
Authors: Vincent E. Houston, Joshua L. Prinzel Urban Air Mobility (UAM) is defined as “…a safe and efficient system for air passenger and cargo transportation within an urban area. It is inclusive of small package delivery and other urban unmanned aerial system services and supports a mix of onboard/ground-piloted and increasingly autonomous operations†[1] (p. 4). UAM operations likely require autonomous systems to enable functions ranging from simplified vehicle operations [2] to fleet and resource management [3]. Automation has had a significant and ubiquitous role in aviation, it has been generally limited in capability and characterized by poor human-system design with numerous disastrous consequences [4]. Autonomy, however, represents a significant evolutionary step up from automation. Autonomous systems are characterized by the capabilities to “independently assume functions typically assigned to human operators, with less human intervention overall and for longer periods of time†[5]. Autonomous systems are self-directed, self-sufficient, and non-deterministic [6] [7]. The system requirements and architectures for UAM represents a variety of functions that can be termed “work-as-imagined†to characterize the notion of how people think work is done and how work is actually [8]. Work-as-imagined is defined through three basic sources: Experience of work-as-done; knowledge and understanding of work-as-prescribed; and exposure to work-as-disclosed [9]. UAM represents a revolutionary approach to aviation, the gap between work-as-imagined to ultimately work-as-done may be significant. An example is represented by the substantial system architecture concepts and technological solutions based on assumptions that UAM will be fully autonomous rather than increasingly autonomous in application. The emerging field of human-autonomy teaming represents a new paradigm that explores the various mechanisms by which humans and machines can work and think together [5] [10] [11]. A team is defined as, “a distinguishable set of two or more agents who interact, dynamically, interdependently, and adaptively toward a common and valued goal/objective/mission†[11] (p. 4). The literature points to pitfalls associated with some automation implementation strategies (e.g., inadequate supervisory control, poor vigilance, skill loss, etc.). A comprehensive, coherent, cohesive prioritized research-driven and data empirical-based approach has been hypothesized to ensure future success of UAM. This work shows that performance is better human autonomy teaming [11]. The proposal shall discuss the burgeoning field of human-autonomy teaming with emphasis on the challenges of identifying data requirements and human-autonomy teaming research needs for UAM. The innovative vision of community air taxi operations currently remains primarily conceptual and ill-defined with few practical and working prototypes. The Autonomous System Technologies for Resilient Airspace Operations (ASTRAO) describes one of NASA’s increasingly autonomous technologies showcasing the potential of the human-autonomy teaming design approach. ASTRAO is a Simplified Vehicle Operation [2] UAM application that utilizes machine learning and data algorithms, coupled with human-autonomy teaming principles and human factors optimization. The technology research and development effort is intended to provide design solutions for future air taxi flight decks, with less experienced and trained pilots, to that of remote supervisory operations for many-to-one vehicles to ground station / remote pilots and/or air traffic service providers. |
Vincent E. Houston Computer/Machine Learning Research Engineer NASA |
Poster | 2020 | |||||||||
Assessing the reliability of prediction intervals from Bayesian Neural Networks (Abstract)
Neural networks (NN) have become popular models because of their predictive power in a variety of applications. Users are beginning to use NN to automate tasks previously done by humans. One criticism of NN is they provide no uncertainty with their predictions, which is problematic in high risk applications. Bayesian neural networks (BNN) provide one approach to quantifying uncertainty by putting NN in a probabilistic framework through placing priors on all weights and computing posterior predictive distributions. We assess the quality of uncertainty given by BNN estimated using Markov Chain Monte Carlo (MCMC) and variational inference (VI) with a simulation study. These results are also compared to Concrete Dropout, another way to provide uncertainty for NN, and to a Gaussian Process model. The effect of network architecture on uncertainty quantification is also explored. BNN fit via MCMC gave uncertainty results similar to those of the Gaussian Process, which performed better than BNN fit via VI or Concrete Dropout. Results also show the significant effects of network architecture on interpolation and show additional issues with over- and underfitting. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. |
Daniel Ries Sandia National Lab |
Poster | 2020 | |||||||||
Creating formal characterizations of routine contingency management in commercial aviation (Abstract)
Traditional approaches to safety management focus on collection of data describing unwanted states (i.e., accidents and incidents) and analysis of undesired behaviors (i.e., faults and errors) that precede those states. Thus, in the traditional view of safety, safety is both defined and measured by its absence, namely the lack of safety. In extremely high confidence systems like commercial air transport, however, opportunities to measure the absence of safety are relatively rare. Ironically, a critical barrier to measuring safety and the impact of mitigation strategies in commercial aviation is the lack of opportunities for measurement. While traditional approaches to safety that focus only on minimizing undesired outcomes have proven utility, they represent, at best, an incomplete view of safety in complex sociotechnical domains such as aviation. For example, pilots and controllers successfully manage contingencies during routine, everyday operations that contribute to the safety of the national airspace system. However, events that result in successful outcomes are not systematically collected or analyzed. Characterization and measurement of routine safety-producing behaviors would create far more opportunities for measurement of safety, potentially increasing the temporal sensitivity, utility and forensics capability of safety assurance methods that can leverage these metrics. The current study describes an initial effort to characterize how pilots and controllers manage contingencies during routine everyday operations and specify that characterization within a cognitive architecture that could potentially be transformed into a formal framework for verification. Rather than focus on rare events in which things went wrong, this study focused on frequent events in which operators adjusted their work to ensure things went right. Namely, this study investigated how operators responded to expected and unexpected disturbances during Area Navigation (RNAV) arrivals into Charlotte Douglas International Airport (KCLT). Event reports submitted to NASA’s Aviation Safety Reporting System (ASRS) that referenced one or more of the KCLT RNAV arrivals were examined. The database search returned 29 event reports that described air carrier operations on one of the RNAV arrivals. Those 29 event reports included 39 narratives, which were examined to identify statements describing safety-producing performance using the Resilience Analysis Grid (RAG) framework (Hollnagel, 2011). The RAG identifies four basic capabilities of resilience performance: anticipating, monitoring for, responding to, and learning from disruptions. Analysis of the 39 ASRS narratives revealed 99 statements describing resilient behaviors, which were categorized to create a taxonomy of 19 resilient performance strategies. The strategies in this taxonomy can be classified and tagged, and can then be formally described as a scenario that leads to either the preservation or degradation of safety. These scenarios can be abstracted and translated into temporal logic formulae, which serve as procedural rules in a knowledge database. This procedure outlines the means by which a set of relational rules that represent the communal knowledge of the system are captured and utilized. We are then able to check whether the knowledge base is consistent and create classes and subclasses which allows for generalization of a particular strategic instance. This procedure enables the future development of a classifier inference engine. |
Jon Holbrook Cognitive Scientist NASA |
Poster | 2020 | |||||||||
Characterizing the Orbital Debris Environment Using Satellite Perturbation Anomaly Data (Abstract)
The untracked orbital debris environment has been described as one of the most serious risks to the survivability of satellites in high-traffic low Earth orbits, where acute satellite population growth is taking place. This paper describes a method for correlating observed satellite orbital changes with orbital debris impacts, and demonstrates how populations of small debris (< 1 cm) can be characterized by directly examining the orbit and attitude changes of individual satellites within constellations. The paper also presents means for detecting unusual movements and other anomalies (e.g., communication losses) in individual satellites and satellite constellations using the Space Surveillance Network, other space surveillance sensors, and in situ methods. Finally, the paper discusses how an anomaly data archive and policy repository might be established, supporting an improved definition of the orbital debris environment in harmony with the President’s Space Policy Directive 3. |
Joel Williamsen Research Staff Member IDA |
Poster | 2020 | |||||||||
CVN 78 Sortie Generation Rate |
Dean Thomas Deputy Director IDA |
Poster | 2020 | |||||||||
Analyzing Miss Distance |
Kevin Kirshenbaum Research Staff Member IDA |
Poster | 2020 | |||||||||
Build Better Graphics |
Brian Vickers Research Staff Member IDA |
Poster | 2020 | |||||||||
Overarching Tracker: A trend Analysis of System performance data |
Caitlan Fealing Data Science Fellow IDA |
Poster | 2020 | |||||||||
Connecting Software Reliability Growth Models to Software Defect Tracking |
Melanie Luperon Student University of Massachusetts |
Poster | 2020 | |||||||||
A Simulation Study of Binary Response Models for the Small-Caliber Primer Sensitivity Test (Abstract)
In ammunition, the primer is a cartridge component containing explosive mix designed to initiate propellant. It is the start of the chain of events leading to projectile launch. Most primers are forcibly struck by a firing pin, and the sensitivity of the primer determines if the cartridge fires. The primer sensitivity test is used to determine whether a batch of primers are over or under sensitive. Over-sensitive primers can lead to accidental discharges, while under-sensitive primers can lead to misfires. The current Government test method relies on a “hand-calculation†based on the normal distribution to determine lot acceptance. Although a good approximation, it is not known if the true process follows a normal distribution. A simulation study was conducted to evaluate relative lot acceptance risk using the hand-calculation compared to various generalized linear models. It is shown that asymptotic behavior, and therefore lot acceptance, is very sensitive to model selection. This sensitivity is quantified and will be used to augment current empirical research. As a result, a more appropriate test method and acceptance model can be determined. |
Zach Krogstad Data Scientist U.S. Army |
Poster | 2020 | |||||||||
Bayesian Analysis (Abstract)
This course will cover the basics of the Bayesian approach to practical and coherent statistical inference. Particular attention will be paid to computational aspects, including MCMC. Examples/practical hands-on exercises will the run gamut from toy illustration to real-world data analysis from all areas of science, with R implementations/coaching provided. The course closely follows P.D. Hoff’s “A First Course in Bayesian Statistical Methods”—Springer 2009. Some examples are borrowed from two other texts which are nice references to have. J. Albert’s’ “Bayesian Computation with R”— Springer 2nd ed. 2009; and “A. Gelman, J.B. Carlin, H.S. Stern, D. Dunson, A. Vehtari and D.B. Rubin’ s “Bayesian Data Analysis”—3rd ed. 2013. |
Dr. Robert Gramacy Virginia Tech |
Short Course | materials | 2019 | ||||||||
Categorical Data Analysis (Abstract)
Categorical data is abundant in the 21st century, and its analysis is vital to advance research across many domains. Thus, data-analytic techniques that are tailored for categorical data are an essential part of the practitioner’s toolset. The purpose of this short course is to help attendees develop and sharpen their abilities with these tools. Topics covered in this short course will include logistic regression, ordinal regression, and classification, and methods to assess predictive accuracy of these approaches will be discussed. Data will be analyzed using the R software package, and course content loosely follow Alan Agresti’s excellent textbook An Introduction to Categorical Data Analysis, Third Edition. |
Dr. Christopher Franck Virginia Tech |
Short Course | materials | 2019 | ||||||||
Multivariate Data Analysis (Abstract)
In this one-day workshop, we will explore five techniques that are commonly used to model human behavior: principal component analysis, factor analysis, cluster analysis, mixture modeling, and multidimensional scaling. Brief discussions of the theory of each method will be provided, along with some examples showing how the techniques work and how the results are interpreted in practice. Accompanying R-code will be provided so attendees are able to implement these methods on their own. |
Dr. Doug Steinley University of Missouri |
Short Course | materials | 2019 | ||||||||
Statistical Methods for Modeling and Simulation Verification and Validation (Abstract)
Statistical Methods for Modeling and Simulation Verification and Validation is a 1-day tutorial in applied statistical methods for the planning, designing and analysis of simulation experiments and live test events for the purposes of verifying and validating models and simulations. The course covers the fundamentals of verification and validation of models and simulations, as it is currently practiced and as is suggested for future applications. The first session is largely an introduction to modeling and simulation concepts, verification and validation policies, along with a basic introduction to data visualization and statistical methods. Session 2 covers the essentials of experiment design for simulations and live test events. The final session focuses on analysis techniques appropriate for designed experiments and tests, as well as observational data, for the express purpose of simulation validation. We look forward to your participation in this course. |
Dr. Jim Simpson, Dr. Jim Wisnowski, and Dr. Stargel Doane JK Analytics LLC, Adsurgo LLC, Analytical Arts LLC |
Short Course | materials | 2019 | ||||||||
Uncertainty Quantification (Abstract)
We increasingly rely on mathematical and statistical models to predict phenomena ranging from nuclear power plant design to profits made in financial markets. When assessing the feasibility of these predictions, it is critical to quantify uncertainties associated with the models, inputs to the models, and data used to calibrate the models. The synthesis of statistical and mathematical techniques, which can be used to quantify input and response uncertainties for simulation codes that can take hours to days to run, comprises the evolving field of uncertainty quantification. The use of data, to improve the predictive accuracy of models, is central to uncertainty quantification so we will begin by providing an overview of how Bayesian techniques can be used to construct distributions for model inputs. We will subsequently describe the computational issues associated with propagating these distributions through complex models to construct prediction intervals for statistical quantities of interest such as expected profits or maximal reactor temperatures. Finally, we will describe the use of sensitivity analysis to isolate critical model inputs and surrogate model construction for simulation codes that are too complex for direct statistical analysis. All topics will be motivated by examples arising in engineering, biology, and economics. |
Dr. Ralph Smith North Carolina State University |
Short Course | materials | 2019 | ||||||||
Design of Experiments (Abstract)
Overview/Course Outcomes- Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance. The course outcomes are: • Ability to plan and execute experiments • Ability to collect data and analyze and interpret these data to provide the knowledge required for business success • Knowledge of a wide range of modern experimental tools that enable practitioners to customize their experiment to meet practical resource constraints The topics covered during the course are: • Fundamentals of DOX – randomization, replication, and blocking. • Planning for a designed experiment – type and size of design, factor selection, levels and ranges, response measurement, sample sizes. • Graphical and statistical approaches to DOX analysis. • Blocking to eliminate the impact of nuisance factors on experimental results. • Factorial experiments and interactions. • Fractional factorials – efficient and effective use of experimental resources. • Optimal designs • Response surface methods • A demonstration illustrating and comparing the effectiveness of different experimental design strategies. This course is focused on helping you and your organization make the most effective utilization of DOX. Software usage is fully integrated into the course Who Should Attend- The course is suitable for participants from an engineering or technical background. Participants will need some previous experience and background in statistical methods. Reference Materials- The course is based on the textbook Design and Analysis of Experiments, 9th Edition, by Douglas C. Montgomery. JMP Software will be discussed and illustrated. |
Dr. Doug Montgomery and Dr. Caleb King Arizona State University, JMP |
Short Course | 2019 | |||||||||
Overview of Design of Experiments (Abstract)
Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance. This course is focused on helping you and your organization make the most effective utilization of DOX. Software usage is fully integrated into the course |
Dr. Doug Montgomery Professor Arizon State University (bio)
Douglas C. Montgomery, Ph.D., is Regent’s Professor of Industrial Engineering and Statistics and ASU Foundation Professor of Engineering at Arizona State University. He was the John M. Fluke Distinguished Professor of Engineering, Director of Industrial Engineering and Professor of Mechanical Engineering at the University of Washington in Seattle. He was a Professor of Industrial and Systems Engineering at the Georgia Institute of Technology. He holds BSIE, MS and Ph.D. degrees from Virginia Tech. Dr. Montgomery’s industrial experience includes engineering assignments with Union Carbide Corporation and Eli Lilly and Company. He also has extensive consulting experience. Dr. Montgomery’s professional interests focus on industrial statistics, including design of experiments, quality and reliability engineering, applications of linear models, and time series analysis and forecasting. He also has interests in operations research and statistical methods applied to modeling and analyzing manufacturing systems. He was a Visiting Professor of Engineering at the Monterey Institute of Technology in Monterey, Mexico, and a University Distinguished Visitor at the University of Manitoba. Dr. Montgomery has conducted basic research in empirical stochastic modeling, process control, and design of experiments. The Department of Defense, the Office of Naval Research, the National Science Foundation, the United States Army, and private industry have sponsored his research. He has supervised 66 doctoral dissertations and over 40 MS theses and MS Statistics Projects. |
Short Course | materials | 2016 | ||||||||
Experiences in Reliability Analysis (Abstract)
Reliability assurance processes in manufacturing industries require data-driven information for making product-design decisions. Life tests, accelerated life tests, and accelerated degradation tests are commonly used to collect reliability data. Data from products in the field provide another important source of useful reliability information. Due to complications like censoring, multiple failure modes, and the need for extrapolation, these reliability studies typically yield data that require special statistical methods. This presentation will describe the analyses of a collection of different life data analysis applications in the area of product reliability. Methods used in the analyses include Weibull and lognormal analysis, analysis of data with multiple failure modes, accelerated test analysis, analysis of both repeated measures and destructive degradation data and the analysis of recurrence data from repairable systems. |
Dr. Bill Meeker Professor Iowa State University (bio)
William Meeker is Professor of Statistics and Distinguished Professor of Liberal Arts and Sciences at Iowa State University. He is a Fellow of the American Statistical Association, the American Society for Quality, and the American Association for the Advancement of Science, and a past Editor of Technometrics. He is co-author of the books Statistical Methods for Reliability Data with Luis Escobar (1998), and Statistical Intervals with Gerald Hahn (1991), and of numerous publications in the engineering and statistical literature. He has won numerous awards for his research and service, including the Brumbaugh, Hunter, Sacks, Shewhart, Youden, and Wilcoxon awards. He has done research and consulted extensively on problems in reliability data analysis, warranty analysis, accelerated testing, nondestructive evaluation, and statistical computing. |
Short Course | materials | 2016 | ||||||||
Advanced Regression (Abstract)
This course assumes that the students have previous exposure to simple and multiple linear regression topics, at least from their undergraduate education; however, the course does not assume that the students are very current with this material. The course goal is to provide more insights and details into some of the more important topics. The presentation emphasizes the use of software for performing the analysis. |
Dr. Geoff Vining Professor Virginia Tech University (bio)
Geoff Vining is a Professor of Statistics at Virginia Tech. From 1999 – 2006, he also was the department head. He currently is a member of the American Society for Quality (ASQ) Board of Directors and Past-Chair of the ASQ Technical Communities Council. In 2016, he serves as the ASQ Treasurer. He is a Fellow of the ASQ, a Fellow of the American Statistical Association (ASA), and an Elected Member of the International Statistical Institute. Dr. Vining served as Editor of the Journal of Quality Technology from 1998 – 2000 and as Editor-in-Chief of Quality Engineering from 2008-2009. Dr. Vining has authored or co-authored three textbooks, all in multiple editions. He was several of the most important awards in industrial statistics/quality engineering including the ASQ’s Shewhart Medal, Brumbaugh Award, and Hunter Award along with the ENBIS Box Medal. He is an internationally recognized expert in the use of experimental design for quality and productivity improvement and in the application of statistical process control. He has extensive consulting experience, most recently with NASA and the U.S. Department of Defense. |
Short Course | materials | 2016 | ||||||||
Introduction to Bayesian (Abstract)
This course will cover the basics of the Bayesian approach to practical and coherent statistical inference. Particular attention will be paid to computational aspects, including MCMC. Examples will the run gamut from toy illustration to real-world data analysis from all areas of science, with R implementations provided. |
Dr. Robert Gramacy Associate Professor University of Chicago (bio)
Professor Gramacy is an Associate Professor of Econometrics and Statistics in the Booth School of business, and a fellow of the Computation Institute at The University of Chicago. His research interests include Bayesian modeling methodology, statistical computing, Monte Carlo inference, nonparametric regression, sequential design, and optimizaton under uncertainty. He specializes in areas of real-data analysis where the ideal modeling apparatus is impractical, or where the current solutions are inefficient and thus skimp on fidelity. |
Short Course | 2016 | |||||||||
Introduction To Design of Experiments (Abstract)
Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance. |
Dr. Doug Montgomery Professor Arizona State University |
Short Course | 2017 | |||||||||
Introduction To Design of Experiments (Abstract)
Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance. |
Dr. Brad Jones Professor Arizona State University |
Short Course | 2017 | |||||||||
Data Farming (Abstract)
This tutorial is designed for newcomers to simulation-based experiments. Data farming is the process of using computational experiments to “grow” data, which can then be analyzed using statistical and visualization techniques to obtain insight into complex systems. The focus of the tutorial will be on gaining practical experience with setting up and running simulation experiments, leveraging recent advances in large-scale simulation experimentation pioneered by the Simulation Experiments & Efficient Designs (SEED) Center for Data Farming at the Naval Postgraduate School (http://harvest.nps.edu). Participants will be introduced to fundamental concepts, and jointly explore simulation models in an interactive setting. Demonstrations and written materials will supplement guided, hands-on activities through the setup, design, data collection, and analysis phases of an experiment-driven simulation study. |
Dr. Susan Sanchez Naval Postgraduate School |
Short Course | materials | 2017 | ||||||||
Split-Plot and Restricted Randomization Designs (Abstract)
Have you ever built what you considered to be the ideal designed experiment, then passed it along to be run and learn later that your recommended run order was ignored? Or perhaps you were part of a test execution team and learned too late that one or more of your experimental factors are difficult or time-consuming to change. We all recognize that the best possible guard against lurking background noise is complete randomization, but often we find that a randomized run order is extremely impractical or even infeasible. Split-plot design and analysis methods have been around for over 80 years, but only in the last several years have the methods fully matured and been made available in commercial software. This class will introduce you to the world of practical split-plot design and analysis methods. We’ll provide you the skills to effectively build designs appropriate to your specific needs and demonstrate proper analysis techniques using general linear models, available in the statistical software. Topics include split-plots for 2-level and mixed-level factor sets, for first and second order models, as well as split-split-plot designs. |
Dr. Jim Simpson JK Analytics |
Short Course | materials | 2017 | ||||||||
Statistical Models for Reliability Data (Abstract)
Engineers in manufacturing industries require data-driven reliability information for making business, product-design, and engineering decisions. The owners and operators of fleets of systems also need reliability information to make good decisions. The course will focus on concepts, examples, models, data analysis, and interpretation of reliability data analyses. Examples and exercises will include product field (maintenance or warranty) data and accelerated life test data. After completing this course, participants will be able to recognize and properly deal with different kinds of reliability data and properly interpret important reliability metrics. Topics will include the use probability plots to identify appropriate distributional models (e.g. Weibull and lognormal distributions), estimating important quantities like distribution quantiles and failure probabilities, the analysis of data with multiple failure modes, and the analysis of recurrence data from a fleet of systems or a reliability growth program. |
Dr. William Meeker Professor Iowa State University |
Short Course | materials | 2017 | ||||||||
Data Visualization (Abstract)
Data visualization allows us to quickly explore and discover relationships graphically and interactively. We will provide the foundations for creating better graphical information to accelerate the insight discovery process and enhance the understandability of reported results. First principles and the “human as part of the system” aspects of information visualization from multiple leading sources such as Harvard Business Review, Edward Tufte, and Stephen Few will be explored using representative example data sets. We will discuss best practices for graphical excellence to most effectively, clearly, and efficiently communicate your story. We will explore visualizations applicable across the conference themes (computational modeling, DOE, statistical engineering, modeling & simulation, and reliability) for univariate, multivariate, time-dependent, and geographical data. |
Dr. Jim Wisnowski Adsurgo |
Short Course | materials | 2017 | ||||||||
Modern Response Surface Methods & Computer Experiments (Abstract)
This course details statistical techniques at the interface between mathematical modeling via computer simulation, computer model meta-modeling (i.e., emulation/surrogate modeling), calibration of computer models to data from field experiments, and model-based sequential design and optimization under uncertainty (a.k.a. Bayesian Optimization). The treatment will include some of the historical methodology in the literature, and canonical examples, but will primarily concentrate on modern statistical methods, computation and implementation, as well as modern application/data type and size. The course will return at several junctures to real-word experiments coming from the physical and engineering sciences, such as studying the aeronautical dynamics of a rocket booster re-entering the atmosphere; modeling the drag on satellites in orbit; designing a hydrological remediation scheme for water sources threatened by underground contaminants; studying the formation of super-nova via radiative shock hydrodynamics. The course material will emphasize deriving and implementing methods over proving theoretical properties. |
Dr. Robert Gramacy Virginia Polytechnic Institute and State University |
Short Course | materials | 2018 | ||||||||
Overview of Design of Experiments (Abstract)
Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance.The course outcomes are: • Ability to plan and execute experiments. • Ability to collect data and analyze and interpret these data to provide the knowledge required for business success. • Knowledge of a wide range of modern experimental tools that enable practitioners to customize their experiment to meet practical resource constraintsThe topics covered during the course are: • Fundamentals of DOX – randomization, replication, and blocking. • Planning for a designed experiment – type and size of design, factor selection, levels and ranges, response measurement, sample sizes. • Graphical and statistical approaches to DOX analysis. • Blocking to eliminate the impact of nuisance factors on experimental results. • Factorial experiments and interactions. • Fractional factorials – efficient and effective use of experimental resources. • Optimal designs. • Response surface methods. • A demonstration illustrating and comparing the effectiveness of different experimental design strategies. This course is focused on helping you and your organization make the most effective utilization of DOX. Software usage is fully integrated into the course. |
Dr. Bradley Jones Distinguished Research Fellow JMP Division/SAS |
Short Course | materials | 2018 | ||||||||
Overview of Design of Experiments (Abstract)
Well-designed experiments are a powerful tool for developing and validating cause and effect relationships when evaluating and improving product and process performance and for operational testing of complex systems. Designed experiments are the only efficient way to verify the impact of changes in product or process factors on actual performance.The course outcomes are: • Ability to plan and execute experiments. • Ability to collect data and analyze and interpret these data to provide the knowledge required for business success. • Knowledge of a wide range of modern experimental tools that enable practitioners to customize their experiment to meet practical resource constraintsThe topics covered during the course are: • Fundamentals of DOX – randomization, replication, and blocking. • Planning for a designed experiment – type and size of design, factor selection, levels and ranges, response measurement, sample sizes. • Graphical and statistical approaches to DOX analysis. • Blocking to eliminate the impact of nuisance factors on experimental results. • Factorial experiments and interactions. • Fractional factorials – efficient and effective use of experimental resources. • Optimal designs. • Response surface methods. • A demonstration illustrating and comparing the effectiveness of different experimental design strategies. This course is focused on helping you and your organization make the most effective utilization of DOX. Software usage is fully integrated into the course. |
Dr. Doug Motgomery Regents’ Professor of Industrial Engineering and Statistics ASU Foundation Professor of Engineering Arizona State University |
Short Course | materials | 2018 | ||||||||
Introduction to R (Abstract)
This course is designed to introduce participants to the R programming language and the R studio editor. R is a free and open-source software for summarizing data, creating visuals of data, and conducting statistical analyses. R can offer many advantages over programs such as Excel including faster computation, customized analyses, access to the latest statistical techniques, automation of tasks, and the ability to easily reproduce research. After completing this course, a new user should be able to: • Import/export data from/to external files. • Create and manipulate new variables. • Conduct basic statistical analyses (such as t-tests and linear regression). • Create basic graphs. • Install and use R packages Participants should bring a laptop for the interactive components of the course. |
Dr. Justin Post North Carlina State Univeristy |
Short Course | materials | 2018 | ||||||||
Survey Construction and Analysis (Abstract)
In this course, we introduce the main concepts of the survey methodology process – from survey sampling design to analyzing the data obtained from complex survey designs. The course topics include: 1. Introduction to the Survey Process 2. R Tools 3. Sampling Designs – Simple Random Sampling, Cluster Sampling, Stratified Sampling, and more 4. Weighting and Variance Estimation 5. Exploratory Data Analysis 6. Complex Survey Analysis We use a combination of lectures and hands-on exercises using R. Students are expected to have R and associated packages installed on their computers. We will send a list of required packages before the course. We also use data from Department of Defense surveys, where appropriate. |
Dr. Wendy Matrinez Bureau of Labor and Statistics |
Short Course | materials | 2018 | ||||||||
Survey Construction and Analysis (Abstract)
In this course, we introduce the main concepts of the survey methodology process – from survey sampling design to analyzing the data obtained from complex survey designs. The course topics include: 1. Introduction to the Survey Process 2. R Tools 3. Sampling Designs – Simple Random Sampling, Cluster Sampling, Stratified Sampling, and more 4. Weighting and Variance Estimation 5. Exploratory Data Analysis 6. Complex Survey Analysis We use a combination of lectures and hands-on exercises using R. Students are expected to have R and associated packages installed on their computers. We will send a list of required packages before the course. We also use data from Department of Defense surveys, where appropriate. |
Dr. MoonJung Cho Bureau of Labor and Statistics |
Short Course | materials | 2018 | ||||||||
Uncertainty Quantification (Abstract)
We increasingly rely on mathematical and statistical models to predict phenomena ranging from nuclear power plant design to profits made in financial markets. When assessing the feasibility of these predictions, it is critical to quantify uncertainties associated with the models, inputs to the models, and data used to calibrate the models. The synthesis of statistical and mathematical techniques, which can be used to quantify input and response uncertainties for simulation codes that can take hours to days to run, comprises the evolving field of uncertainty quantification. The use of data, to improve the predictive accuracy of models, is central to uncertainty quantification so we will begin by providing an overview of how Bayesian techniques can be used to construct distributions for model inputs. We will subsequently describe the computational issues associated with propagating these distributions through complex models to construct prediction intervals for statistical quantities of interest such as expected profits or maximal reactor temperatures. Finally, we will describe the use of sensitivity analysis to isolate critical model inputs and surrogate model construction for simulation codes that are too complex for direct statistical analysis. All topics will be motivated by examples arising in engineering, biology, and economics. |
Dr. Ralph Smith North Carlina State Univeristy |
Short Course | materials | 2018 | ||||||||
Categorical Data Analysis (Abstract)
Categorical data is abundant in the 21st century, and its analysis is vital to advance research across many domains. Thus, data-analytic techniques that are tailored for categorical data are an essential part of the practitioner’s toolset. The purpose of this short course is to help attendees develop and sharpen their abilities with these tools. Topics covered in this short course will include logistic regression, ordinal regression, and classification, and methods to assess predictive accuracy of these approaches will be discussed. Data will be analyzed using the R software package, and course content loosely follow Alan Agresti’s excellent textbook An Introduction to Categorical Data Analysis, Third Edition. |
Chris Franck Virginia Tech (bio)
Chris Franck is an assistant professor in the Department of Statistics at Virginia Tech. His research focuses in Bayesian model selection and averaging, objective Bayes, and spatial statistics. Much of his work has a specific emphasis in health applications |
Shortcourse![]() |
2020 | |||||||||
A Practical Introduction To Gaussian Process Regression (Abstract)
Abstract: Gaussian process regression is ubiquitous in spatial statistics, machine learning, and the surrogate modeling of computer simulation experiments. Fortunately their prowess as accurate predictors, along with an appropriate quantification of uncertainty, does not derive from difficult-to-understand methodology and cumbersome implementation. We will cover the basics, and provide a practical tool-set ready to be put to work in diverse applications. The presentation will involve accessible slides authored in Rmarkdown, with reproducible examples spanning bespoke implementation to add-on packages. Instructor Bio: Robert Gramacy is a Professor of Statistics in the College of Science at Virginia Polytechnic and State University (Virginia Tech). Previously he was an Associate Professor of Econometrics and Statistics at the Booth School of Business, and a fellow of the Computation Institute at The University of Chicago. His research interests include Bayesian modeling methodology, statistical computing, Monte Carlo inference, nonparametric regression, sequential design, and optimization under uncertainty. Professor Gramacy is a computational statistician. He specializes in areas of real-data analysis where the ideal modeling apparatus is impractical, or where the current solutions are inefficient and thus skimp on fidelity. Such endeavors often require new models, new methods, and new algorithms. His goal is to be impactful in all three areas while remaining grounded in the needs of a motivating application. His aim is to release general purpose software for consumption by the scientific community at large, not only other statisticians. Professor Gramacy is the primary author on six R packages available on CRAN, two of which (tgp, and monomvn) have won awards from statistical and practitioner communities. |
Robert “Bobby” Gramacy Virginia Tech (bio)
Robert Gramacy is a Professor of Statistics in the College of Science at Virginia Polytechnic and State University (Virginia Tech). Previously he was an Associate Professor of Econometrics and Statistics at the Booth School of Business, and a fellow of the Computation Institute at The University of Chicago. His research interests include Bayesian modeling methodology, statistical computing, Monte Carlo inference, nonparametric regression, sequential design, and optimization under uncertainty. Professor Gramacy is a computational statistician. He specializes in areas of real-data analysis where the ideal modeling apparatus is impractical, or where the current solutions are inefficient and thus skimp on fidelity. Such endeavors often require new models, new methods, and new algorithms. His goal is to be impactful in all three areas while remaining grounded in the needs of a motivating application. His aim is to release general purpose software for consumption by the scientific community at large, not only other statisticians. Professor Gramacy is the primary author on six R packages available on CRAN, two of which (tgp, and monomvn) have won awards from statistical and practitioner communities. |
Shortcourse![]() |
![]() | 2020 | ||||||||
Network Analysis (Abstract)
Understanding the connections and dependencies that exist in our data is becoming ever more important. This one-day course on network analysis will introduce many of the basic concepts of networks, including descriptive statistics (e.g., centrality, prestige, etc.), community detection, and an introduction to nonparametric inferential tests. Additionally, cutting-edge methods for creating so-called “psychometric networks” that focus on the connections between variables will be covered. Throughout, we will discuss visualization methods that can highlight the nature of connections between entities in the network, whether they are observations, variables, or both. |
Doug Steinley University of Missouri (bio)
Doug Steinley is a Professor in the Psychological Sciences Department at the University of Missouri. His research focuses on multivariate statistical methodology, with a primary interest in cluster analysis and social network analysis. His research in cluster analysis focuses on both traditional cluster analytic procedure (e.g., k-means cluster analysis) and more modern techniques (e.g., mixture modeling). In that the formulation of the general partitioning problem can be thought of in a graph theoretic nature, his research also involves combinatorics and social network analysis. |
Shortcourse![]() |
2020 | |||||||||
Introduction to Machine Learning: Classification Algorithms (Abstract)
This short course discusses the applications of machine learning from a lay person’s perspective and presents the landscape of approaches and their utility. We then dive into a technical hands-on workshop implementing classification algorithms to build predictive models, tune them, and interpret their results. Applications include forecasting behaviors and events. Topics that will be covered include: introduction to machine learning & its applications, introduction to classification and supervised machine learning, classification algorithms, and classification performance metrics. *Pre-requisites: Attendees must be comfortable using R to manipulate data and must know how to create basic visualizations with ggplot2. |
Martin Skarzynski DATA SOCIETY (bio)
My primary research interest is in understanding health risk factors by combining scientific expertise from diverse fields with machine intelligence. I believe I am uniquely equipped to bridge the gaps between scientific disciplines and deliver on the promise of data science in health research. My preferred tools are R and Python, open source programming languages kept on the cutting edge by their active and supportive communities. Through research and teaching, I am constantly improving my ability to obtain, tidy, explore, transform, visualize, model, and communicate data. I aim to utilize my technical skills and science background to become a leader among the next generation of multidisciplinary data scientists. |
Shortcourse![]() |
2020 | |||||||||
Software Testing (Abstract)
Systematically testing software for errors and demonstrating that it meets the system specification is a necessary component of assuring trustworthiness of information systems. Software testing is often costly and time consuming when conducted correctly, but the consequence of poor quality testing is even higher, especially for critical systems. This short course will provide an introduction to software testing including the process of testing within the software development lifecycle, as well as techniques and considerations in choosing test cases for constructing comprehensive test suites to achieve coverage of the code and/or input space as relevant to the system under test. Existing tools for test automation and test suite construction will be presented. |
Erin Lanus Virginia Tech (bio)
Erin Lanus is a Research Assistant Professor at the Hume Center for National Security and Technology at Virginia Tech. She has a Ph.D. in Computer Science with a concentration in cybersecurity from Arizona State University. Her experience includes work as a Research Fellow at University of Maryland Baltimore County and as a High Confidence Software and Systems Researcher with the Department of Defense. Her current interests are software and combinatorial testing, machine learning in cybersecurity, and artificial intelligence assurance. |
Shortcourse![]() |
2020 | |||||||||
The Path to Assured Autonomy (Abstract)
Autonomous systems are becoming increasingly ubiquitous throughout society. They are used to optimize our living and work environments, protect our institutions and critical infrastructure, transport goods and people across the world, and so much more. However, there are fundamental challenges to (1) designing and verifying the safe and reliable operation of autonomous systems, (2) ensuring their security and resilience to adversarial attack, (3) predictably and seamlessly integrating autonomous systems into complex human ecosystems, and (4) ensuring the beneficial impact of autonomous systems on human society. In collaboration with government, industry, and academia, the Johns Hopkins Institute for Assured Autonomy addresses these challenges across three core pillars of technology, ecosystem, and policy & governance in order to drive a future where autonomous systems are trusted contributors to society. |
Cara LaPointe Director Johns Hopkins University Institute for Assured Autonomy (bio)
Dr. Cara LaPointe is a futurist who focuses on the intersection of technology, policy, ethics, and leadership. She works at the Johns Hopkins Applied Physics Laboratory where she serves as the Interim Co-Director of the Johns Hopkins Institute for Assured Autonomy to ensure that autonomous systems are safe, secure, and trustworthy as they are increasingly integrated into every aspect of our lives. During more than two decades in the United States Navy, Dr. LaPointe held numerous roles in the areas of autonomous systems, acquisitions, ship design and production, naval force architecture, power and energy systems, and unmanned vehicle technology integration. At the Deep Submergence Lab of the Woods Hole Oceanographic Institution (WHOI), she conducted research in underwater autonomy and robotics, developing sensor fusion algorithms for deep-ocean autonomous underwater vehicle navigation. Dr. LaPointe was previously a Senior Fellow at Georgetown University’s Beeck Center for Social Impact + Innovation where she created the “Blockchain Ethical Design Framework” as a tool to drive social impact and ethics into blockchain technology. Dr. LaPointe has served as an advisor to numerous global emerging technology initiatives and she is a frequent speaker on autonomy, artificial intelligence, blockchain, and other emerging technologies at a wide range of venues such as the United Nations, the World Bank, the Organization for Economic Co-operation and Development, SXSW, and the Aspen Institute in addition to various universities. Dr. LaPointe is a patented engineer, a White House Fellow, and a French American Foundation Young Leader. She served for two Presidents across an administration transition as the Interim Director of the President’s Commission on White House Fellowships. Cara holds a Doctor of Philosophy awarded jointly by the Massachusetts Institute of Technology (MIT) and WHOI, a Master of Science and a Naval Engineer degree from MIT, a Master of Philosophy from the University of Oxford, and a Bachelor of Science from the United States Naval Academy. |
Spotlight![]() |
2020 | |||||||||
Data Science from Scratch in the DoD (Abstract)
Although new organizations and companies are created every day in the private sector, in the public sector this is much rarer. As such, the establishment of Army Futures Command was the single most significant Army reorganization since 1973. The concept of the command is simple: build a better Army for years to come, which will be completed through harnessing artificial intelligence and big data analysis to quickly process information and identify trends to shape modernization efforts. This presentation will the share lessons learned from standing up a Data and Decision Sciences Directorate within Army Futures Command and address the pitfalls associated with developing an enduring strategy and capability for a command managing a $30B+ modernization portfolio from day one. |
Cade Saie Director of Data and Decision Sciences US Army Futures Command (bio)
Lieutenant Colonel (Promotable) Cade Saie is an active officer in the United States Army and serves as an Operations Research Systems Analyst (ORSA). He was commissioned in the Army as an Infantry officer and has served in the 82nD Airborne and 1st Armored Divisions, commanding an infantry company in Ramadi, Iraq from 2005-2006. After command, he transitioned to FA49 (Operations Research Systems Analyst) and was assigned to the TRADOC Analysis Center-Ft Lee before attending the Air Force Institute of Technology (AFIT) where he received a master’s and doctorate degree. Upon graduation, he was assigned to the U.S. Army Cyber Command at Ft. Belvoir, VA where he established and led the command’s data science team from 2014-2017. In October of 2017 he was selected to be part of the Army Futures Command Task Force upon its standup and served in that capacity until the command was established and took a position as the AFC Chief Data Officer and the Director of Data and Decision Sciences. He has been published in the European Journal of the Operational Research, Journal of Algorithms and Computational Technology, and The R Journal. He has been awarded Dr. Wilbur B. Payne Memorial Award for Excellence in Analysis in 2009 and 2015. He has a B.S. in Computer Information’s Systems from Norwich University in Northfield, VT and a M.S. in Systems Engineering and PhD in Operations Research. |
Spotlight![]() |
2020 | |||||||||
Quantifying Science Needs and Mission Requirements through Uncertainty Quantification (Abstract)
With the existing breadth of Earth observations and our ever-increasing knowledge about our home planet, it proves more and more difficult to identify gaps in our current knowledge and the observations to fill these gaps. Sensitivity analyses (SA) and uncertainty quantification (UQ) in existing data and models can help to not only identify and quantify gaps but also provide the means to suggest how much impact a targeted new observation will have on the science in this field. This presentation will discuss general approaches and specific examples (e.g. Sea Level Rise research) how we use SA and UQ to systematically assess where gaps in our current understanding of a science area lie, how to identify the observations that fills the gap, and how to evaluate the expected impact of the observation on the scientific understanding in this area. |
Carmen Boening Deputy Manager Earth Science Section/Research Scientist Jet Propulsion Laboratory/California Institute of Technology (bio)
Dr. Carmen Boening is a Climate Scientist and Deputy Manager of the Earth Science Section at NASA’s Jet Propulsion Laboratory, California Institute of Technology. She received her PhD in physics with a focus on physical oceanography and space geodesy from the University of Bremen, Germany in 2009. After a postdoctoral appointment 2009-2011, she started her current position in climate science at JPL. Since 2015, her responsibilities have expanded to management roles, including the group supervisor role for the Sea Level and Ice group (2015-2018), Project Scientist of the Gravity Recovery and Climate Experiment (GRACE), and since 2018 Deputy Manager of the Earth Science Section at JPL. Her research focus is sea level science with an emphasis on how interannual fluctuations in the global water cycle influence sea level in the short and long term. Motivated by the fact that interannual and decadal variability have a significant impact on trend estimates and associated uncertainties, the estimation of uncertainties in climate predictions has become a significant part of her research. |
Spotlight![]() |
2020 | |||||||||
Tutorial: Combinatorial Methods for Testing and Analysis of Critical Software and Security Systems (Abstract)
Combinatorial methods have attracted attention as a means of providing strong assurance at reduced cost, but when are these methods practical and cost-effective? This tutorial includes two sections on the basis and application of combinatorial test methods: The first section explains the background, process, and tools available for combinatorial testing, with illustrations from industry experience with the method. The focus is on practical applications, including an industrial example of testing to meet FAA-required standards for life-critical software for commercial aviation. Other example applications include modeling and simulation, mobile devices, network configuration, and testing for a NASA spacecraft. The discussion will also include examples of measured resource and cost reduction in case studies from a variety of application domains. The second part explains combinatorial testing-based techniques for effective security testing of software components and large-scale software systems. It will develop quality assurance and effective re-verification for security testing of web applications and testing of operating systems. It will further address how combinatorial testing can be applied to ensure proper error-handling of network security protocols and provide the theoretical guarantees for detecting Trojans injected in cryptographic hardware. Procedures and techniques, as well as workarounds will be presented and captured as guidelines for a broader audience. |
Rick Kuhn, Dimitris Simos, and Raghu Kacker National Institute of Standards & Technology |
Tutorial |
![]() | 2019 | ||||||||
Tutorial: Cyber Attack Resilient Weapon Systems (Abstract)
This tutorial is an abbreviated version of a 36-hour short course recently provided by UVA to a class composed of engineers working at the Defense Intelligence Agency. The tutorial provides a definition for cyber attack resilience that is an extension of earlier definitions of system resilience that were not focused on cyber attacks. Based upon research results derived by the University of Virginia over an eight year period through DoD/Army/AF/Industry funding , the tutorial will illuminate the following topics: 1) A Resilence Design Requirements methodology and the need for supporting analysis tools, 2) a System Architecture approach for achieving resilience, 3) Example resilience design patterns and example prototype implementations, 4) Experimental results regarding resilience-related roles and readiness of system operators, and 5) Test and Evaluation Issues. The tutorial will be presented by UVA Munster Professor Barry Horowitz. |
Barry Horowitz Professor, Systems Engineering University of Virginia |
Tutorial |
![]() | 2019 | ||||||||
Tutorial: Learning Python and Julia (Abstract)
In recent years, the programming language Python with its supporting ecosystem has established itself as a significant capability to support the activities of the typical data scientist. Recently, version 1.0 of the programming language Julia has been released; from a software engineering perspective, it can be viewed as a modern alternative. This tutorial presents both Python and Julia from both a user and developer point of view. From a user’s point of view, the basic syntax of each, along with fundamental prerequisite knowledge presented. From a developers point of view the underlying infrastructure of the programming language / interpreter / compiler is discussed. |
Douglas Hodson Associate Professor Air Force Institute of Technology |
Tutorial | 2019 | |||||||||
Tutorial: Statistics Boot Camp (Abstract)
In the test community, we frequently use statistics to extract meaning from data. These inferences may be drawn with respect to topics ranging from system performance to human factors. In this mini-tutorial, we will begin by discussing the use of descriptive and inferential statistics, before exploring the basics of interval estimation and hypothesis testing. We will introduce common statistical techniques and when to apply them, and conclude with a brief discussion of how to present your statistical findings graphically for maximum impact. |
Kelly Avery IDA |
Tutorial |
![]() | 2019 | ||||||||
Tutorial: Reproducible Research (Abstract)
Analyses are “reproducible” if the same methods applied to the same data produce identical results when run again by another researcher (or you in the future). Reproducible analyses are transparent and easy for reviewers to verify, as results and figures can be traced directly to the data and methods that produced them. There are also direct benefits to the researcher. Real-world analysis workflows inevitably require changes to incorporate new or additional data, or to address feedback from collaborators, reviewers, or sponsors. These changes are easier to make when reproducible research best practices have been considered from the start. Poor reproducibility habits result in analyses that are difficult or impossible to review, are prone to compounded mistakes, and are inefficient to re-run in the future. They can lead to duplication of effort or even loss of accumulated knowledge when a researcher leaves your organization. With larger and more complex datasets, along with more complex analysis techniques, reproducibility is more important than ever. Although reproducibility is critical, it is often not prioritized either due to a lack of time or an incomplete understanding of end-to-end opportunities to improve reproducibility. This tutorial will discuss the benefits of reproducible research and will demonstrate ways that analysts can introduce reproducible research practices during each phase of the analysis workflow: preparing for an analysis, performing the analysis, and presenting results. A motivating example will be carried throughout to demonstrate specific techniques, useful tools, and other tips and tricks where appropriate. The discussion of specific techniques and tools is non-exhaustive; we focus on things that are accessible and immediately useful for someone new to reproducible research. The methods will focus mainly on work performed using R, but the general concepts underlying reproducible research techniques can be implemented in other analysis environments, such as JMP and Excel, and are briefly discussed. By implementing the approaches and concepts discussed during this tutorial, analysts in defense and aerospace will be equipped to produce more credible and defensible analyses of T&E data. |
Andrew Flack, Kevin Kirshenbaum, and John Haman IDA |
Tutorial |
![]() | 2019 | ||||||||
Tutorial: Developing Valid and Reliable Scales (Abstract)
The DoD uses psychological measurement to aid in decision-making about a variety of issues including the mental health of military personnel before and after combat, and the quality of human-systems interactions. To develop quality survey instruments (scales) and interpret the data obtained from these instruments appropriately, analysts and decision-makers must understand the factors that affect the reliability and validity of psychological measurement. This tutorial covers the basics of scale development and validation and discusses current efforts by IDA, DOT&E, ATEC, and JITC to develop validated scales for use in operational test and evaluation. |
Heather Wojton & Shane Hall IDA / USARMY ATEC |
Tutorial |
![]() | 2019 | ||||||||
The Bootstrap World (Abstract)
Bootstrapping is a powerful tool for statistical estimation and inference. In this tutorial, we will use operational test scenarios to provide context when exploring examples ranging from the simple (estimating a sample mean) to the complex (estimating a confidence interval for system availability). Areas of focus will include point estimates, confidence intervals, parametric bootstrapping and hypothesis testing with the bootstrap. The strengths and weaknesses of bootstrapping will also be discussed. |
Dr. Matt Avery Research Staff Member IDA |
Tutorial | materials | 2016 | ||||||||
(Abstract)
This tutorial will provide attendees with a live demo of an open source software reliability tool to automatically apply models to data. Functionality to be illustrated includes how to: Select and view data in time between failures, cumulative failures, and failure intensity formats. Apply trend tests to determine if a data set exhibits reliability growth, which is a prerequisite to apply software reliability growth models. Apply models to a data set . Apply measures of model goodness of fit to obtain quantitative guidance to select one or more models based on the needs of the user .Query model results to determine the additional testing time required to achieve a desired reliability. Following this live demonstration an overview of the underlying mathematical theory will be presented, including: Representation of failure data formats. Laplace trend test and running arithmetic average. Maximum likelihood estimation. Failure rate and failure counting software reliability models. Akaike information criterion and predictive sum of squares error. |
Dr. Lance Fiondella Univeristy of Massachusetts, Dartmouth |
Tutorial | materials | 2016 | ||||||||
A Statistical Tool for Efficient and information-Rich Testing (Abstract)
Binomial metrics like probability-to-detect or probability-to-hit typically provide operationally meaningful and easy to interpret test outcomes. However, they are informationpoor metrics and extremely expensive to test. The standard power calculations to size a test employ hypothesis tests, which typically result in many tens to hundreds of runs. In addition to being expensive, the test is most likely inadequate for characterizing performance over a variety of conditions due to the inherently large statistical uncertainties associated with binomial metrics. A solution is to convert to a continuous variable, such as miss distance or time-todetect. The common objection to switching to a continuous variable is that the hit/miss or detect/non-detect binomial information is lost, when the fraction of misses/no-detects is often the most important aspect of characterizing system performance. Furthermore, the new continuous metric appears to no longer be connected to the requirements document, which was stated in terms of a probability. These difficulties can be overcome with the use of censored data analysis. This presentation will illustrate the concepts and benefits of this approach, and will illustrate a simple analysis with data, including power calculations to show the cost savings for employing the methodology. |
Dr. Bram Lillard Research Staff Member IDA |
Tutorial | materials | 2016 | ||||||||
Power Anyalysis Concepts |
Dr. Jim Simpson JK Analytics |
Tutorial | materials | 2016 | ||||||||
Introduction to Survey Design (Abstract)
Surveys are a common tool for assessing user experiences with systems in various stages of development. This mini-tutorial introduces the social and cognitive processes involved in survey measurement and addresses best practices in survey design. Clarity of question wording, appropriate scale use, and methods for reducing survey-fatigue are emphasized. Attendees will learn practical tips to maximize the information gained from user surveys and should bring paper and pencils to practice writing and evaluating questions. |
Dr. Heather Wojton Research Staff Member IDA |
Tutorial | materials | 2016 | ||||||||
Introduction to Survey Design (Abstract)
Surveys are a common tool for assessing user experiences with systems in various stages of development. This mini-tutorial introduces the social and cognitive processes involved in survey measurement and addresses best practices in survey design. Clarity of question wording, appropriate scale use, and methods for reducing survey-fatigue are emphasized. Attendees will learn practical tips to maximize the information gained from user surveys and should bring paper and pencils to practice writing and evaluating questions. |
Dr. Justin Mary Research Staff Member IDA |
Tutorial | materials | 2016 | ||||||||
Introduction to Survey Design (Abstract)
Surveys are a common tool for assessing user experiences with systems in various stages of development. This mini-tutorial introduces the social and cognitive processes involved in survey measurement and addresses best practices in survey design. Clarity of question wording, appropriate scale use, and methods for reducing survey-fatigue are emphasized. Attendees will learn practical tips to maximize the information gained from user surveys and should bring paper and pencils to practice writing and evaluating questions. |
Mr. Jonathan Snavely IDA |
Tutorial | materials | 2016 | ||||||||
Bayesian Data Analysis in R/STAN (Abstract)
In an era of reduced budgets and limited testing, verifying that requirements have been met in a single test period can be challenging, particularly using traditional analysis methods that ignore all available information. The Bayesian paradigm is tailor made for these situations, allowing for the combination of multiple sources of data and resulting in more robust inference and uncertainty quantification. Consequently, Bayesian analyses are becoming increasingly popular in T&E. This tutorial briefly introduces the basic concepts of Bayesian Statistics, with implementation details illustrated in R through two case studies: reliability for the Core Mission functional area of the Littoral Combat Ship (LCS) and performance curves for a chemical detector in the Common Analytical Laboratory System (CALS) with different agents and matrices. Examples are also presented using RStan, a high-performance open-source software for Bayesian inference on multi-level models. |
Dr. Kassandra Fronczyk IDA |
Tutorial | materials | 2016 | ||||||||
Bayesian Data Analysis in R/STAN (Abstract)
In an era of reduced budgets and limited testing, verifying that requirements have been met in a single test period can be challenging, particularly using traditional analysis methods that ignore all available information. The Bayesian paradigm is tailor made for these situations, allowing for the combination of multiple sources of data and resulting in more robust inference and uncertainty quantification. Consequently, Bayesian analyses are becoming increasingly popular in T&E. This tutorial briefly introduces the basic concepts of Bayesian Statistics, with implementation details illustrated in R through two case studies: reliability for the Core Mission functional area of the Littoral Combat Ship (LCS) and performance curves for a chemical detector in the Common Analytical Laboratory System (CALS) with different agents and matrices. Examples are also presented using RStan, a high-performance open-source software for Bayesian inference on multi-level models. |
Dr. James Brownlow U.S. Air Force 812TSS/ENT |
Tutorial | materials | 2016 | ||||||||
Presenting Complex Statistical Methodologies to Military Leadership (Abstract)
More often than not, the data we analyze for the military is plagued with statistical issues. Multicollinearity, small sample sizes, quasi-experimental designs, and convenience samples are some examples of what we commonly see in military data. Many of these complications can be resolved either in the design or analysis stage with appropriate statistical procedures. But, to keep our work useful, usable, and transparent to the military leadership who sponsors it, we must strike the elusive balance between explaining and justifying our design and analysis techniques and not inundating our audience with unnecessary details. It can be even more difficult to get military leadership to understand the statistical problems and solutions so well that they are enthused and supportive of our approaches. Using literature written on the subject as well as a variety of experiences, we will showcase several examples, as well as present ideas for keeping our clients actively engaged in statistical methodology discussions. |
Dr. Jane Pinelis John Hopkins University, Applied Physics Lab |
Tutorial | materials | 2016 | ||||||||
Sensitivity Experiments (Abstract)
A sensitivity experiment is a special type of experimental design that is used when the response variable is binary and the covariate is continuous. Armor protection and projectile lethality tests often use sensitivity experiments to characterize a projectile’s probability of penetrating the armor. In this minitutorial we illustrate the challenge of modeling a binary response with a limited sample size, and show how sensitivity experiments can mitigate this problem. We review eight different single covariate sensitivity experiments and present a comparison of these designs using simulation. Additionally, we cover sensitivity experiments for cases that include more than one covariate, and highlight recent research in this area. The mini-tutorial concludes with a case study by Greg Hutto on Army grenade fuze testing, titled “Preventing Premature ZAP: EMPATHY Capacitive Design With 3 Phase Optimal Design (3pod).” |
Dr. Thomas Johnson Research Staff Member IDA |
Tutorial | materials | 2016 | ||||||||
Sensitivity Experiments (Abstract)
A sensitivity experiment is a special type of experimental design that is used when the response variable is binary and the covariate is continuous. Armor protection and projectile lethality tests often use sensitivity experiments to characterize a projectile’s probability of penetrating the armor. In this minitutorial we illustrate the challenge of modeling a binary response with a limited sample size, and show how sensitivity experiments can mitigate this problem. We review eight different single covariate sensitivity experiments and present a comparison of these designs using simulation. Additionally, we cover sensitivity experiments for cases that include more than one covariate, and highlight recent research in this area. The mini-tutorial concludes with a case study by Greg Hutto on Army grenade fuze testing, titled “Preventing Premature ZAP: EMPATHY Capacitive Design With 3 Phase Optimal Design (3pod).” |
Mr. Greg Hutton U.S. Air Force , 96 Test Wing |
Tutorial | materials | 2016 | ||||||||
Creating Shiny Apps in R for Sharing Automated Statistical Products (Abstract)
Interactive web apps can be built straight from R with the R package, Shiny. hiny apps are becoming more prevalent as a way to automate statistical products and share them with others who do not know R. This tutorial will cover Shiny app syntax and how to create basic Shiny apps. Participants will create basic apps by working through several examples and explore how to change and improve these apps. Participants will leave the session with the tools to create their own complicated applications. Participants will need a computer with R, R Studio, and the shiny R package installed. |
Mr. Randy Griffiths U.S. Army Evaluation Center |
Tutorial | materials | 2018 | ||||||||
Determination of Power for Complex Experimental Designs (Abstract)
Power tells us the probability of rejecting the null hypothesis for an effect of a given size, and helps us select an appropriate design prior to running the experiment. The key to computing power for an effect is determining the size of the effect. We describe a general approach for sizing effects that covers a wide variety of designs including two-level factorials, multilevel factorials with categorical levels, split-plot and response surface designs. The application of power calculations to DoE is illustrated by way of several case studies. These case studies include both continuous and binomial responses. In the case of response surface designs, the fitted model is usually used for drawing contour maps, 3D surfaces, making predictions, or performing optimization. For these purposes, it is important that the model adequately represent the response behavior over the region of interest. Therefore, power to detect individual model parameters is not a good measure of what we are designing for. A discussion and pertinent examples will show attendees how the precision of the fitted surface (i.e. the precision of the predicted response) relative to the noise is a critical criterion in design selection. In this presentation, we introduce a process to determine if the design has adequate precision for DoE needs. |
Mr. Pat Whitcomb Stat-Ease, Inc |
Tutorial | materials | 2018 | ||||||||
Operational Testing of Cyber Systems (Abstract)
Previous operational tests that included cybersecurity focused in on vulnerabilities discovered at the component level and ad hoc system level exploitation attacks during adversarial assessments. The subsequent evaluation of vulnerabilities and attacks as it relates to the overall resilience of the system were largely qualitative in nature and chalk full of human centered biases making them unreliable estimators of system resilience in a cyber contested environment. To mitigate these shortcomings this tutorial will present an approach for more structured operational tests based on common search algorithms; and, more rigorous quantitative measurements and analysis based on actuarial methods for estimating resilience. |
Mr. Paul Johnson MCOTEA |
Tutorial | materials | 2018 | ||||||||
Demystifying Data Science (Abstract)
Data science is the new buzz word – it is being touted as the solution for everything from curing cancer to self-driving cars. How is data science related to traditional statistics methods? Is data science just another name for “big data”? In this mini-tutorial, we will begin by discussing what data science is (and is not). We will then discuss some of the key principles of data science practice and conclude by examining the classes of problems and methods that are included in data science. |
Dr. Alyson Wilson Laboratory for Analytic Sciences North Carolina State Univeristy |
Tutorial | materials | 2018 | ||||||||
Statistics Boot Camp (Abstract)
In the test community, we frequently use statistics to extract meaning from data. These inferences may be drawn with respect to topics ranging from system performance to human factors. In this mini-tutorial, we will begin by discussing the use of descriptive and inferential statistics. We will continue by discussing commonly used parametric and nonparametric statistics within the defense community, ranging from comparisons of distributions to comparisons of means. We will conclude with a brief discussion of how to present your statistical findings graphically for maximum impact. |
Dr. Stephanie Lane Research Staff Member IDA |
Tutorial |
![]() | 2018 | ||||||||
Robust Parameter Design (Abstract)
The Japanese industrial engineer, Taguchi, introduced the concept of robust parameter design in the 1950s. Since then, it has seen widespread, successful application in automotive and aerospace applications. Engineers have applied this methodology both to physical and computer experimentation. This tutorial provides a basic introduction to these concepts, with an emphasis on how robust parameter design provides a proper basis for the evaluation and confirmation of system performance. The goal is to show how to modify basic robust parameter designs to meet the specific needs of the weapons testing community.This tutorial targets systems engineers, analysts, and program managers who must evaluate and confirm complex system performance. The tutorial illustrates new ideas that are useful for the evaluation and the confirmation of the performance for such systems.What students will learn: • The basic concepts underlying robust parameter design • The importance of the statistical concept of interaction to robust parameter design • How statistical interaction is the key concept underlying much of the evaluation and confirmation of system performance, particularly of weapon systems |
Dr. Geoff Vining | Tutorial | materials | 2018 | ||||||||
Exploratory Data Analysis (Abstract)
After decades of seminal methodological research on the subject—accompanied by a myriad of applications—John Tukey formally created the statistical discipline known as EDA with the publication of his book “Exploratory Data Analysis” in 1977. The breadth and depth of this book was staggering, and its impact pervasive, running the gamut from today’s routine teaching of box plots in elementary schools, to the existent core philosophy of data exploration “in-and-for-itself” embedded in modern day statistics and AI/ML. As important as EDA was at its inception, it is even more essential now, with data sets increasing in both complexity and size. Given a science & engineering problem/question, and given an existing data set, we argue that the most important deliverable in the problem-solving process is data-driven insight; EDA visualization techniques lie at the core of extracting that insight. This talk has 3 parts: 1. Data Diamond: In light of the focus of DATAWorks to share essential methodologies for operational testing/evaluation, we first present a problem-solving framework (simple in form but rich in content) constructed and fine-tuned over 4 decades of scientific/engineering problem-solving: the data diamond. This data-centric structure has proved essential for systematically approaching a variety of research and operational problems, for determining if the data on hand has the capacity to answer the question at hand, and for identifying weaknesses in the total experimental effort that might compromise the rigor/correctness of derived solutions. 2. EDA Methods & Block Plot: We discuss those EDA graphical tools that have proved most important/insightful (for the presenter) in attacking the wide variety of physical/chemical/ biological/engineering/infotech problems existent in the NIST environment. Aside from some more commonly-known EDA tools in use, we discuss the virtues/applications of the block plot, which is a tool specifically designed for the “comparative” problem type–ascertaining as to whether the (yes/no) conclusion about the statistical significance of a single factor under study, is in fact robustly true over the variety of other factors (material/machine/method/operator/ environment, etc.) that co-exist in most systems. The testing of army bullet-proof vests is used as an example. 3. 10-Step DEX Sensitivity Analysis: Since the rigor/robustness of testing & evaluation conclusions are dictated not only by the choice of (post-data) analysis methodologies, but more importantly by the choice of (pre-data) experiment design methodologies, we demonstrate a recommended procedure for the important “sensitivity analysis” problem–determining what factors most affect the output of a multi-factor system. The deliverable is a ranked list (ordered by magnitude) of main effects (and interactions). Design-wise, we demonstrate the power and efficiency of orthogonal fractionated 2-level designs for this problem; analysis-wise, we present a structured 10-step graphical analysis which provides detailed data-driven insight into what “drives” the system, what optimal settings exist for the system, what prediction model exists for the system, and what direction future experiments should be to further optimize the system. The World Trade Center collapse analysis is used as an example. |
Dr. Jim Filliben | Tutorial | 2018 | |||||||||
Quality Control and Statistical Process Control (Abstract)
formance. On the other hand, the need to draw causal inference about factors not under the researchers’ control, calls for a specialized set of techniques developed for observational studies. The persuasiveness and adequacy of such an analysis depends in part on the ability to recover metrics from the data that would approximate those of an experiment. This tutorial will provide a brief overview of the common problems encountered with lack of randomization, as well as suggested approaches for rigorous analysis of observational studies. |
Dr. Jane Pinelis Research Staff Member IDA |
Tutorial |
![]() | 2018 | ||||||||
Evolving Statistical Tools (Abstract)
In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness |
Dr. Matthew Avery Research Staff Member IDA |
Tutorial | materials | 2018 | ||||||||
Evolving Statistical Tools (Abstract)
In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness |
Dr. Tyler Morgan-Wall Research Staff Member IDA |
Tutorial | materials | 2018 | ||||||||
Evolving Statistical Tools (Abstract)
In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness |
Dr. Benjamin Ashwell Research Staff Member IDA |
Tutorial | materials | 2018 | ||||||||
Evolving Statistical Tools (Abstract)
In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness |
Dr. Kevin Kirshenbaum Research Staff Member IDA |
Tutorial | materials | 2018 | ||||||||
Evolving Statistical Tools (Abstract)
In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness |
Dr. Stephanie Lane Research Staff Member IDA |
Tutorial | materials | 2018 | ||||||||
Evolving Statistical Tools (Abstract)
In this session, researchers from the Institute for Defense Analyses (IDA) present a collection of statistical tools designed to meet ongoing and emerging needs for planning, designing, and evaluating operational tests. We first present a suite of interactive applications hosted on test.testscience.testscience.org that are designed to address common analytic needs in the operational test community. These freely available resources include tools for constructing confidence intervals, computing statistical power, comparing distributions, and computing Bayesian reliability. Next, we discuss four dedicated software tools: JEDIS – a JMP Add-In for automating power calculations for designed experiments skpr – an R package for generating optimal experimental designs and easily evaluating power for normal and non-normal response variables ciTools – an R package for quickly and simply generating confidence intervals and quantifying uncertainty for simple and complex linear models nautilus – an R package for visualizing and analyzing aspects of sensor performance, such as detection range and track completeness |
Dr. Jason Sheldon Research Staff Member IDA |
Tutorial | materials | 2018 | ||||||||
Strategies for Sequential Experimentation (Abstract)
Design of experiments is typically presented as a “one shot” approach. However, it may be more efficient to divide the experiment into smaller pieces, thus expending resources in a smarter, more adaptive manner. This sequential approach becomes especially suitable when experimenters begin with very little information about the process, for example, when scaling up a new product. It allows for better definition of the design space, adaption to unexpected results, estimation of variability, reduction in waste, and validation of the results. The statistical literature primarily focuses on sequential experimentation in the context of screening, which in our experience is only the beginning of an overall strategy for experimentation. This tutorial begins with screening and then goes well beyond this first step for more complete coverage of this important topic:
|
Martin Bezener Director of Research & Development Stat-Ease |
Tutorial | 2020 | |||||||||
Deep Learning Models for Image Analysis & Object Recognition |
Ridhima Amruthesh DATA SOCIETY (bio)
Ridhima is a data scientist who enjoys using the Python and R programming languages to explore, analyze, visualize and present data. Ridhima believes that anyone, regardless of their background, can learn and benefit from technical skills. In addition to teaching at Data Society, Ridhima has a strong background in computer science and that pushed her to pursue her master’s degree in Information Systems. She realized her passion was to help educate others on the importance of using data to derive insights, for all fields. She is currently a manager at Data Society and loves to help build and grow the data science team as well as educate others. She is passionate about creating interactive visualizations and building insightful visualizations for clients. Ridhima holds a MS in Information Systems from the University of Maryland, College Park and BE In Computer Science and Engineering. |
Tutorial | 2020 | |||||||||
Machine Learning for cybersecurity – Self Learning Systems for Threat Detection |
Nisha Iyer DATA SOCIETY (bio)
Nisha is a data scientist who enjoys using the Python and R programming languages to explore, analyze, visualize and present data. Nisha believes that anyone, regardless of their background, can learn and benefit from technical skills. In addition to teaching at Data Society, Nisha has worked in corporate consulting and media to build and grow data science teams. During this time, she not only built the teams but was educating others in the company on the importance and need for data science. Data Society has helped her grow her passion for spreading data literacy across commercial and government clients. Nisha holds an MS in Data Science from the George Washington University in Washington DC and a BA in Communication from the University of MD, College Park. |
Tutorial | 2020 | |||||||||
Taking Down a Turret: Introduction to Cyber Operational Test and Evaluation (Abstract)
Cyberattacks are in the news every day, from data breaches of banks and stores to ransomware attacks shutting down city governments and delaying school years. In this mini-tutorial, we introduce key cybersecurity concepts and methods to conducting cybersecurity test and evaluation. We walk you through a live demonstration of a cyberattack and provide real-world examples of each major step we take. The demonstration shows an attacker gaining command and control of a Nerf turret. We leverage tools commonly used by red teams to explore an attack scenario involving phishing, network scanning, password cracking, pivoting, and finally creating a mission effect. We also provide a defensive view and analytics that shows artifacts left by the attack path. |
OED Cyber Lab IDA |
Tutorial | 2020 | |||||||||
Part 1 (Abstract)
Invariably, any analyst who has been in the field long enough has heard the dreaded questions: “Is X number of samples enough? How much data do I need for my experiment?” Ulterior motives aside, any investigation involving data must ultimately answer the question of “How many?” to avoid risking either insufficient data to detect a scientifically significant effect or having too much data leading to a waste of valuable resources. This can become particularly difficult when the underlying model is complex (e.g. longitudinal designs with hard-to-change factors, time-to-event response with censoring, binary responses with non-uniform test levels, etc.). Even in the supposedly simpler case of categorical factors, where run size is often chosen using a lower bound power calculation, a simple approach can mask more “powerful” techniques. In this tutorial, we will spend the first half exploring how to use simulation to perform power calculations in complex modeling situations drawn from relevant defense applications. Techniques will be illustrated using both R and JMP Pro. In the second half, we will investigate the case of categorical factors and illustrate how treating the unknown effects as random variables induces a distribution on statistical power, which can then be used as a new way to assess experimental designs. Instructor Bio: Caleb King is a Research Statistician Tester for the DOE platform in the JMP software. He received his MS and PhD in Statistics from Virginia Tech and worked for three years as a statistical scientist at Sandia National Laboratories prior to arriving at JMP. His areas of expertise include optimal design of experiments, accelerated testing, reliability analysis, and small-sample theory |
Caleb King JMP Division, SAS Institute Inc. |
Tutorial![]() |
materials | 2020 | ||||||||
21st Century Screening Designs (Abstract)
Since 2000 there have been several innovations in screening experiment design and analysis. Many of these methods are now available in commercial-off-the-shelf (COTS) software. Developments include; improved Nearly Orthogonal Arrays (NOAs) (2002, 2006), Definitive Screening Designs (DSDs) (2011), weighted A-optimal designs, and Group-orthogonal Supersaturated Designs (GO SSDs) (2019). • NOAs have proven effective for finding well balanced screening designs – especially when many or all factors are categorical at different numbers of levels. • DSDs are capable of collapsing into response surface designs. When too many factors are significant to support a response surface model, DSDs can be efficiently augmented to do so. • A-optimal designs allow the experimenter to leverage their knowledge to weight the importance of model terms to improve design performance relative to D-optimal and I-optimal designs. • GO SSDs allows experimenters to run fewer trials than there are factors with a strong likelihood that significant factors will be orthogonal. If they are not orthogonal they can efficiently be made so by folding over and adding 4 new rows. This tutorial will give examples of each of these approaches to screening many factors and provide rules of thumb for choosing which to apply for any specific problem type. |
Thomas Donnelly Principal Systems Engineer SAS Institute Inc. |
Tutorial | 2020 | |||||||||
Introduction to Uncertainty Quantification for Practitioners and Engineers (Abstract)
Uncertainty is an inescapable reality that can be found in nearly all types of engineering analyses. It arises from sources like measurement inaccuracies, material properties, boundary and initial conditions, and modeling approximations. Uncertainty Quantification (UQ) is a systematic process that puts error bands on results by incorporating real world variability and probabilistic behavior into engineering and systems analysis. UQ answers the question: what is likely to happen when the system is subjected to uncertain and variable inputs. Answering this question facilitates significant risk reduction, robust design, and greater confidence in engineering decisions. Modern UQ techniques use powerful statistical models to map the input-output relationships of the system, significantly reducing the number of simulations or tests required to get accurate answers. This tutorial will present common UQ processes that operate within a probabilistic framework. These include statistical Design of Experiments, statistical emulation methods used to create the simulation inputs to response relationship, and statistical calibration for model validation and tuning to better represent test results. Examples from different industries will be presented to illustrate how the covered processes can be applied to engineering scenarios. This is purely an educational tutorial and will focus on the concepts, methods, and applications of probabilistic analysis and uncertainty quantification. SmartUQ software will only be used for illustration of the methods and examples presented. This is an introductory tutorial designed for practitioners and engineers with little to no formal statistical training. However, statisticians and data scientists may also benefit from seeing the material presented from a more practical use than a purely technical perspective. There are no prerequisites other than an interest in UQ. Attendees will gain an introductory understanding of Probabilistic Methods and Uncertainty Quantification, basic UQ processes used to quantify uncertainties, and the value UQ can provide in maximizing insight, improving design, and reducing time and resources. Instructor Bio: Gavin Jones, Sr. SmartUQ Application Engineer, is responsible for performing simulation and statistical work for clients in aerospace, defense, automotive, gas turbine, and other industries. He is also a key contributor in SmartUQ’s Digital Twin/Digital Thread initiative. Mr. Jones received a B.S. in Engineering Mechanics and Astronautics and a B.S. in Mathematics from the University of Wisconsin-Madison. |
Gavin Jones Sr. Application Engineer SmartUQ |
Tutorial![]() |
2020 | |||||||||
Cyber Tutorial (Abstract)
Cyberattacks are in the news every day, from data breaches of banks and stores to ransomware attacks shutting down city governments and delaying school years. In this mini-tutorial, we introduce key cybersecurity concepts and methods to conducting cybersecurity test and evaluation. We walk you through a live demonstration of a cyberattack and provide real-world examples of each major step we take. The demonstration shows an attacker gaining command and control of a Nerf turret. We leverage tools commonly used by red teams to explore an attack scenario involving phishing, network scanning, password cracking, pivoting, and finally creating a mission effect. We also provide a defensive view and analytics that shows artifacts left by the attack path. |
Peter Mancini IDA |
![]() |
![]() | 2020 | ||||||||
Cyber Tutorial |
Mark Herrera IDA |
![]() |
![]() | 2020 | ||||||||
Part 2 (Abstract)
Instructor Bio: Ryan Lekivetz is a Senior Research Statistician Developer for the JMP Division of SAS where he implements features for the Design of Experiments platforms in JMP software. |
Ryan Lekivetz JMP Division, SAS Institute Inc. |
![]() |
![]() | 2020 | ||||||||
Cyber Tutorial |
Kelly Tran IDA |
![]() |
2020 | |||||||||
Cyber Tutorial |
Jason Schlup IDA |
![]() |
2020 | |||||||||
Cyber Tutorial |
Lee Allison IDA |
![]() |
2020 |
Name | Organization | Type | Yr |
---|---|---|---|
Peter Parker | NASA | Workshop Organizer | 2016 |
Laura Freeman | IDA | Workshop Organizer | 2016 |
Rebecca Medlin | IDA | Logistics | 2019 |
David Greene | IDA | Logistics | 2019 |
Heather Wojton | IDA | Co-Chair | 2019 |
Ronald Fricker | Department of Statistics,Virginia Tech | Co-Chair | 2019 |
Amy Braverman | Jet Propulsion Laboratory,California Institute of Technology | Co-Chair | 2019 |
Laura Freeman | Virginia Tech University | Co-Chair | 2019 |
Jonathan Rathsam | NASA | Co-Chair | 2019 |
Jane Pinelis | John Hopkins University Applied Physics Laboratory | Co-Chair | 2019 |
Julia Shirley | IDA | Logistics | 2018 |
Heather Wojton | IDA | Logistics | 2018 |
Kendall Beebe | IDA | Logistics | 2018 |
Jonathan Rathsam | IDA | Co-Chair | 2018 |
Jane Pinelis | IDA | Co-Chair | 2018 |
Laura Freeman | IDA | Co-Chair | 2018 |
Alyson Wilson | North Carolina State University | Co-Chair | 2018 |
Alyson Wilson | North Carolina State University | Co-Chair | 2019 |
Diane Quarles | Army Research Lab | TPC | 2019 |
Laura Castro-Schilo | JMP | TPC | 2019 |
Poornima Madhavan | IDA | TPC | 2019 |