Session Title | Speaker | Type | Recording | Materials | Year |
---|---|---|---|---|---|
Breakout Characterizing Human-Machine Teaming Metrics for Test and Evaluation (Abstract)
As advanced technologies and capabilities are enabling machines to engage in tasks that only humans have done previously, new challenges have emerged for the rigorous testing and evaluation (T&E) of human-machine teaming (HMT) concepts. We differentiate the distinction between a HMT and a human using a tool, and new challenges are enumerated: Agents’ mental models are opaque, machine-to-human communications need to be evaluated, and self-tasking and autonomy need to be evaluated. We argue that a focus on mission outcomes cannot fully characterize team performance due to the increased problem space evaluated and that the T&E community needs to develop and refine new metrics for agents of teams and teammate interactions. Our IDA HMT framework outlines major categories for HMT evaluation, emphasizing team metrics and parallelizing agent metrics across humans and machines. Major categories are tied to the literature and proposed as a starting point for additional T&E metric specification for robust evaluation. |
Brian Vickers Research Staff Member Institute for Defense Analyses ![]() (bio)
Brian is a Research Staff Member at the Institute for Defense Analyses where he applies rigorous statistics and study design to evaluate, test, and report on various programs. Dr. Vickers holds a Ph.D. from the University of Michigan, Ann Arbor where he researched various factors that influence decision making, with a focus on how people allocate their money, time, and other resources. |
Breakout |
![]() | 2021 |
|
Keynote Closing Remarks |
Robert Behler Director DOT&E ![]() (bio)
Robert F. Behler was sworn in as Director of Operational Test and Evaluation on December 11, 2017. A Presidential appointee confirmed by the United States Senate, he serves as the senior advisor to the Secretary of Defense on operational and live fire test and evaluation of Department of Defense weapon systems. Prior to his appointment, he was the Chief Operating Officer and Deputy Director of the Carnegie Mellon University Software Engineering Institute (SEI), a Federally Funded Research and Development Center. SEI is a global leader in advancing software development and cybersecurity to solve the nation’s toughest problems through focused research, development, and transition to the broader software engineering community. Before joining the SEI, Mr. Behler was the President and CEO of SRC, Inc. (formerly the Syracuse Research Corporation). SRC is a not-for-profit research and development corporation with a forprofit manufacturing subsidiary that focuses on radar, electronic warfare and cybersecurity technologies. Prior to working at SRC, Mr. Behler was the General Manager and Senior Vice President of the MITRE Corp where he provided leadership to more than 2,500 technical staff in 65 worldwide locations. He joined MITRE from the Johns Hopkins University Applied Physics Laboratory where he was a General Manager for more than 350 scientists and engineers as they made significant contributions to critical Department of Defense (DOD) precision engagement challenges. General Behler served 31 years in the United States Air Force, retiring as a Major General in 2003. During his military career, he was the Principal Adviser for Command and Control, Intelligence, Surveillance and Reconnaissance (C21SR) to the Secretary and Chief of Staff of the U.S. Air Force (USAF). International assignments as a general officer included the Deputy Commander for NATO’s Joint Headquarters North in Stavanger, Norway. He was the Director of the Senate Liaison Office for the USAF during the 104th congress. Mr. Behler also served as the assistant for strategic systems to the Director of Operational Test and Evaluation. As an experimental test pilot, he flew more than 65 aircraft types. Operationally he flew worldwide reconnaissance missions in the fastest aircraft in the world, the SR-71 Blackbird. Mr. Behler is a Fellow of the Society of Experimental Test Pilots and an Associate Fellow of the American Institute of Aeronautics and Astronautics. He is a graduate of the University of Oklahoma where he received a B.S. and M.S. in aerospace engineering, has a MBA from Marymount University and was a National Security Fellow at the JFK School of Government at Harvard University. Mr. Behler has recently been on several National Research Council studies for the National Academy of Sciences including: “Critical Code,” “Software Producibility, Achieving Effective Acquisition of Information Technology in the Department of Defense” and “Development Planning: A Strategic Approach to Future Air Force Capabilities.” |
Keynote | 2018 |
||
Keynote Closing Remarks (Abstract)
Mr. William (Allen) Kilgore serves as Director, Research Directorate at NASA Langley Research Center. He previously served as Deputy Director of Aerosciences providing executive leadership and oversight for the Center’s Aerosciences fundamental and applied research and technology capabilities with the responsibility over Aeroscience experimental and computational research. After being appointed to the Senior Executive Service (SES) in 2013, Mr. Kilgore served as the Deputy Director, Facilities and Laboratory Operations in the Research Directorate. Prior to this position, Mr. Kilgore spent over twenty years in the operations of NASA Langley’s major aerospace research facilities including budget formulation and execution, maintenance, strategic investments, workforce planning and development, facility advocacy, and integration of facilities’ schedules. During his time at Langley, he has worked in nearly all of the major wind tunnels with a primary focus on process controls, operations and testing techniques supporting aerosciences research. For several years, Mr. Kilgore led the National Transonic Facility, the world’s largest cryogenic wind tunnel. Mr. Kilgore has been at NASA Langley Research Center since 1989, starting as a graduate student. Mr. Kilgore earned a B.S. and M.S. in Mechanical Engineering with concentration in dynamics and controls from Old Dominion University in 1984 and 1989, respectively. He is the recipient of NASA’s Exceptional Engineering Achievement Medal in 2008 and Exceptional Service Medal in 2012. |
William “Allen” Kilgore Director, Research Directorate NASA Langley Research Center ![]() |
Keynote | Session Recording |
Recording | 2021 |
Closing Remarks |
Alyson Wilson NCSU ![]() (bio)
Dr. Alyson Wilson is the Associate Vice Chancellor for National Security and Special Research Initiatives at North Carolina State University. She is also a professor in the Department of Statistics and Principal Investigator for the Laboratory for Analytic Sciences. Her areas of expertise include statistical reliability, Bayesian methods, and the application of statistics to problems in defense and national security. Dr. Wilson is a leader in developing transformative models for rapid innovation in defense and intelligence. Prior to joining NC State, Dr. Wilson was a jointly appointed research staff member at the IDA Science and Technology Policy Institute and Systems and Analyses Center (2011-2013); associate professor in the Department of Statistics at Iowa State University (2008-2011); Scientist 5 and technical lead for Department of Defense Programs in the Statistical Sciences Group at Los Alamos National Laboratory (1999-2008); and senior statistician and operations research analyst with Cowboy Programming Resources (1995-1999). She is currently serving on the National Academy of Sciences Committee on Applied and Theoretical Statistics and on the Board of Trustees for the National Institute of Statistical Sciences. Dr. Wilson is a Fellow of the American Statistical Association, the American Association for the Advancement of Science, and an elected member of the International Statistics Institute. |
2022 |
|||
Cloud Computing for Computational Fluid Dynamics (CFD) in T&E (Abstract)
In this talk we’ll focus on exploring the motivation for using cloud computing for Computational Fluid Dynamics (CFD) for Federal Government Test & Evaluation. Using examples from automotive, aerospace and manufacturing we’ll look at benchmarks for a number of CFD codes using CPUs (x86 & Arm) and GPUs and we’ll look at how the development of high-fidelity CFD e.g. WMLES, HRLES, is accelerating the need for access to large scale HPC. The onset of COVID-19 has also meant a large increase in the need for remote visualization with greater numbers of researchers and engineering needing to work from home. This has also accelerated the adoption of the same approaches needed towards the pre- and post-processing of peta/exa-scale CFD simulation and we’ll look at how these are more easily accessed via a cloud infrastructure. Finally, we’ll explore perspectives on integrating ML/AI into CFD workflows using data lakes from a range of sources and where the next decade may take us. |
Neil Ashton WW Principal CFD Specialist Solution Architect, HPC Amazon Web Services ![]() (bio)
Neil Ashton is the WW subject matter expert for CFD within AWS. He works with customers in enterprise, startup and public-sector across the globe to help them to run their CFD (and often also FEA) workloads on AWS. In addition he acts as a key advisor to the global product teams to deliver better hardware and software for CFD and broader CAE users. He is also still very active in academic research around deep-learning and machine learning, future HPC approaches and novel CFD approaches (GPU’s, numerical methods, turbulence modelling) |
Session Recording |
![]() Recording | 2022 |
|
Breakout Cognitive Work Analysis – From System Requirements to Validation and Verification (Abstract)
Human-system interaction is a critical yet often neglected aspect of the system development process. It is mostly commonly incorporated into system performance assessments late in the design process leaving little opportunity for any substantive changes to be made to ensure satisfactory system performance achieved. As a result, workarounds and compromises become a patchwork of “corrections” that end up in the final fielded system. But what if mission outcomes, the work context, and performance expectations can be articulated earlier in the process, thereby influencing the development process throughout? This presentation will discuss how a formative method from the field of cognitive systems engineering, cognitive work analysis, can be leveraged to derive design requirements compatible with traditional systems engineering processes. This method establishes not only requirements from which system designs can be constructed, but also how system performance expectations can be more acutely defined a priori to guide the validation and verification process. Cognitive work analysis methods will be described to highlight how ‘cognitive work’ and ‘information relationship’ requirements can be derived and will be showcased in a case-study application of building a decision support system for future human spaceflight operations. Specifically, a description of the testing campaign employed to verify and validate the fielded system will be provided. In summary, this presentation will cover how system requirements can be established early in the design phase, guide the development of design solutions, and subsequently be used to assess the operational performance of the solutions within the context of the work domain it is intended to support. |
Matthew Miller Exploration Research Engineer Jacobs/NASA Johnson Space Center ![]() (bio)
Matthew J. Miller is an Exploration Research Engineer within the Astromaterials Research and Exploration Sciences (ARES) division at NASA Johnson Space Center. His work focuses on advancing present-day tools, technologies and techniques to improve future EVA operations by applying cognitive systems engineering principles. He has over seven years of EVA flight operations and NASA analog experience where he has developed and deployed various EVA support systems and concept of operations. He received a B.S. (2012), M.S. (2014) and Ph.D. (2017) in aerospace engineering from the Georgia Institute of Technology. |
Breakout |
![]() | 2021 |
|
Breakout Collaborative Human AI Red Teaming (Abstract)
The Collaborative Human AI Red Teaming (CHART) project is an effort to develop an AI Collaborator which can help human test engineers quickly develop test plans for AI systems. CHART was built around processes developed for cybersecurity red-teaming. Using a goal-focused approach based upon iteratively testing and attacking a system then updating the testers model to discover novel failure modes not discovered by traditional T&E processes. Red teaming is traditionally a time intensive process which requires subject matter expert to study the system they are testing for months in order to develop attack strategies. CHART will accelerate this process by guiding the user through the process of diagraming the AI system under test and drawing upon a pre-established body of knowledge to identify the most probably vulnerabilities. CHART was provided internal seedling funds during FY20 to perform a feasibility study of the technology. During this period the team developed a taxonomy of AI vulnerabilities and an ontology of AI irruptions. Irruptions being events (either caused by a malicious actor or unintended consequences) which trigger the vulnerability and lead to an undesirable result. Using this taxonomy we built a threat modeling tool that allows users to diagram their AI system and identifies all the possible irruptions which could occur. This initial demonstration was based around two scenarios. An smartphone-based ECG system for telemedicine and a UAV trained reinforcement learning to avoid mid-air collisions. In this talk we will first discuss how Red Teaming differs from adversarial machine learning and traditional testing and evaluation. Next, we will provide an overview of how industry is approaching the problem of AI Red Teaming and how our approach differs. Finally, we will discuss how we developed our taxonomy of AI vulnerabilities, how to apply goal-focused testing to AI systems, and our strategy for automatically generating test plans. |
Galen Mullins Senior AI Researcher Johns Hopkins University Applied Physics Laboratory ![]() (bio)
Dr. Galen Mullins is a senior staff scientist in the Robotics Group of the Intelligent Systems branch at the Johns Hopkins Applied Physics Laboratory. His research is focused on developing intelligent testing techniques and adversarial tools for finding the vulnerabilities of AI systems. His recent project work has included the development of new imitation learning frameworks for modeling the behavior of autonomous vehicles, creating algorithms for generating adversarial environments, and developing red teaming procedures for AI systems. He is the secretary for the IEEE/RAS working group on Guidelines for Verification of Autonomous Systems and teaches the Introduction to Robotics course at the Johns Hopkins Engineering for Professionals program. Dr. Galen Mullins received his B.S degrees in Mechanical Engineering and Mathematics respectively from Carnegie Mellon University in 2007 and joined APL the same year. Since then he earned his M.S. in Applied Physics from Johns Hopkins University in 2010, and his Ph.D in Mechanical Engineering from the University of Maryland in 2018. His doctoral research was focused on developing active learning algorithms for generating adversarial scenarios for autonomous vehicles. |
Breakout |
![]() | 2021 |
|
Breakout Combinational Testing (Abstract)
Combinatorial methods have attracted attention as a means of providing strong assurance at reduced cost. Combinatorial testing takes advantage of the interaction rule, which is based on analysis of thousands of software failures. The rule states that most failures are induced by single factor faults or by the joint combinatorial effect (interaction) of two factors, with progressively fewer failures induced by interactions between three or more factors. Therefore if all faults in a system can be induced by a combination of t or fewer parameters, then testing all t-way combinations of parameter values is pseudo-exhaustive and provides a high rate of fault detection. The talk explains background, method, and tools available for combinatorial testing. New results on using combinatorial methods for oracle-free testing of certain types of applications will also be introduced |
Raghu Kacker NIST |
Breakout | Materials | 2017 |
|
Breakout Combinational Testing (Abstract)
Combinatorial methods have attracted attention as a means of providing strong assurance at reduced cost. Combinatorial testing takes advantage of the interaction rule, which is based on analysis of thousands of software failures. The rule states that most failures are induced by single factor faults or by the joint combinatorial effect (interaction) of two factors, with progressively fewer failures induced by interactions between three or more factors. Therefore if all faults in a system can be induced by a combination of t or fewer parameters, then testing all t-way combinations of parameter values is pseudo-exhaustive and provides a high rate of fault detection. The talk explains background, method, and tools available for combinatorial testing. New results on using combinatorial methods for oracle-free testing of certain types of applications will also be introduced |
Rick Kuhn NIST |
Breakout | Materials | 2017 |
|
Short Course Combinatorial Interaction Testing (Abstract)
This mini-tutorial provides an introduction to combinatorial interaction testing (CIT). The main idea behind CIT is to pseudo-exhaustively test software and hardware systems by covering combinations of components in order to detect faults. In 90 minutes, we provide an overview of this domain that includes the following topics: the role of CIT in software and hardware testing, how it complements and differs from design of experiments, considerations such as variable strength and constraints, the typical combinatorial arrays used for constructing test suites, and existing tools for test suite construction. Last, defense systems are increasingly relying on software with embedded machine learning (ML), yet ML poses unique challenges to applying conventional software testing due to characteristics such as the large input space, effort required for white box testing, and emergent behaviors apparent only at integration or system levels. As a well-studied black box approach to testing integrated systems with a pseudo-exhaustive strategy for handling large input spaces, CIT provides a good foundation for testing ML. In closing, we present recent research adapting concepts of combinatorial coverage to test design for ML. |
Erin Lanus Research Assistant Professor Virginia Tech ![]() (bio)
Erin Lanus is a Research Assistant Professor at the Hume Center for National Security and Technology at Virginia Tech. She has a Ph.D. in Computer Science with a concentration in cybersecurity from Arizona State University. Her experience includes work as a Research Fellow at University of Maryland Baltimore County and as a High Confidence Software and Systems Researcher with the Department of Defense. Her current interests are software and combinatorial testing, machine learning in cybersecurity, and artificial intelligence assurance. |
Short Course | Session Recording |
Materials
Recording | 2021 |
Breakout Combinatorial Testing for Link-16 Developmental Test and Evaluation (Abstract)
Due to small Tactical Data Link testing windows, only commonly used messages are tested resulting in the evaluation of only a small subset of all possible Link 16 messages. To increase the confidence that software design and implementation issues are discovered in the earliest phases of government acceptance testing, Marine Corps Tactical Systems Support Activity (MCTSSA) Instrumentation and Data Management Section (IDMS) successfully implemented an extension of the traditional form of Design of Experiments (DOE), called Combinatorial Testing (CT). CT was utilized to reduce the human bias and inconsistencies involved in Link 16 testing and replace them with a thorough test that can validate a system’s ability to properly consume all of the possible valid combinations of Link 16 message field values. MCTSSA’s unique team of subject matter experts was able to bring together the tenants of virtualization, automation, C4I Air systems testing, tactical data link testing, and Design of Experiments methodology, to invent a testing paradigm that will exhaustively evaluate tactical Air systems. This presentation will give an overview of how CT was implemented for the test. |
Tim Mclean MCTSSA |
Breakout | Materials | 2017 |
|
Poster Combining data from scanners to inform cadet physical performance (Abstract)
Digital anthropometry obtained from 3D body scanners has already revolutionized the clothing and fitness industries. Within seconds, these scanners collect hundreds of anthropometric measurements which are used by tailors to customize an article of clothing or by fitness trainers to track their client’s progress towards a goal. Three-dimensional body scanners have also been used in military applications, such as predicting injuries at Army basic training and checking a solder’s compliance with body composition standards. In response this increased demand, several 3D body scanners have become commercially available, each with a proprietary algorithm for measuring specific body parts. Individual scanners may suffice to collect measurements from a small population; however, they are not practical for use in creating large data sets necessary to train artificial intelligence (AI) or machine learning algorithms. This study fills the gap between these two applications by correlating body circumferences taken from a small population (n = 109) on three different body scanners and creating a standard scale for pooling data from the different scanners into one large AI ready data set. This data set is then leveraged in a separate application to understand the relationship between body shape and performance on the Army Combat Fitness Test (ACFT). |
Nicholas Ashby Student United States Military Academy ![]() (bio)
Nicholas (Nick) Ashby is a fourth-year cadet at the United States Military Academy. He was born in California’s Central Coast and grew up in Charlottesville, VA. At West Point he is pursuing a B.S. in Applied Statistics and Data Science and will commission as an Army Aviation officer in May. In his free time Nick works as a student manager and data analyst for Army’s NCAA Division 1 baseball team and he enjoys playing golf as well. His research, under advisor Dr. Diana M. Thomas, has focused on body shape, body composition, and performance on the U.S. Army Combat Fitness Test (ACFT). |
Poster | Session Recording |
![]() Recording | 2022 |
Breakout Combining Human Factors Data and Models of Human Performance (Abstract)
As systems and missions become increasingly complex, the roles of humans throughout the mission life cycle is evolving. In areas, such as maintenance and repair, hands-on tasks still dominate, however, new technologies have changed many tasks. For example, some critical human tasks have moved from manual control to supervisory control, often of systems at great distances (e.g., remotely piloting a vehicle, or science data collection on Mars). While achieving mission success remains the key human goal, almost all human performance metrics focus on failures rather than successes. This talk will examine the role of humans in creating mission success as well as new approaches for system validation testing needed to keep up with evolving systems and human roles. |
Cynthia Null Technicial Fellow for Human Factors |
Breakout | Materials | 2018 |
|
Breakout Combining information for Realiability Assesment-Tuesday Morning |
Alyson Wilson North Carolina State University |
Breakout | Materials | 2016 |
|
Breakout Communicating Complex Statistical Methodologies to Leadership (Abstract)
More often than not, the data we analyze for the military is plagued with statistical issues. Multicollinearity, small sample sizes, quasi-experimental designs, and convenience samples are some examples of what we commonly see in military data. Many of these complications can be resolved either in the design or analysis stage with appropriate statistical procedures. But, to keep our work useful, usable, and transparent to the military leadership who sponsors it, we must strike the elusive balance between explaining and justifying our design and analysis techniques and not inundating our audience with unnecessary details. It can be even more difficult to get military leadership to understand the statistical problems and solutions so well that they are enthused and supportive of our approaches. Using literature written on the subject as well as a variety of experiences, we will showcase several examples, as well as present ideas for keeping our clients actively engaged in statistical methodology discussions. |
Jane Pinelis Johns Hopkins University Applied Physics Lab or JHU |
Breakout | Materials | 2017 |
|
Breakout Communicating Complex Statistical Methodologies to Leadership (Abstract)
More often than not, the data we analyze for the military is plagued with statistical issues. Multicollinearity, small sample sizes, quasi-experimental designs, and convenience samples are some examples of what we commonly see in military data. Many of these complications can be resolved either in the design or analysis stage with appropriate statistical procedures. But, to keep our work useful, usable, and transparent to the military leadership who sponsors it, we must strike the elusive balance between explaining and justifying our design and analysis techniques and not inundating our audience with unnecessary details. It can be even more difficult to get military leadership to understand the statistical problems and solutions so well that they are enthused and supportive of our approaches. Using literature written on the subject as well as a variety of experiences, we will showcase several examples, as well as present ideas for keeping our clients actively engaged in statistical methodology discussions. |
Paul Johnson MCOTEA |
Breakout | Materials | 2017 |
|
Breakout Communicating Statistical Concepts and Results: Lessons Learned from the US Service Academies (Abstract)
Communication is critical both for analysts and decision-makers who rely on analysis to inform their choices. The Service Academies are responsible for educating men and women who may serve in both roles over the course of their careers. Analysts must be able to summarize their results concisely and communicate them to the decision-maker in a way that is relevant and actionable. Decision-makers understand that analytical results may carry with them uncertainty and be able to incorporate this uncertainty properly when evaluating different options. This panel explores the role of the US Service Academies in preparing their students for both roles. Featuring representatives from the US Air Force Academy, the US Naval Academy, and the US Military Academy, this panel will cover how future US Officers are taught to use and communicate with data. Topics include developing and motivating numerical literacy, understandings of uncertainty, how analysts should frame uncertainty to decision-makers, and how decision-makers should understand information presented with uncertainty. Panelists will discuss what they think the academies do well and areas that are ripe for improvement. |
Panel Discussion | Breakout | 2019 |
||
Breakout Communication in Statistics & the Five Hardest Concepts |
Jennifer Van-Mellekom Virginia Tech |
Breakout | 2017 |
||
Breakout Comparing Experimental Designs (Abstract)
This tutorial will show how to compare and choose experimental designs based on multiple criteria. Answers to questions like “Which Design of Experiments (DOE) is better/best?” will be answered by looking at both data and graphics that show the relative performance of the designs based on multiple criteria, including; power of the designs for different model terms, how well the designs minimize predictive variance across the design space, to what level are model terms confounded or correlated, what are the relative efficiencies that measure how well coefficients are estimated or how well predictive variance is minimized. Many different case studies of screening, response surface, and screening augmented to response surface designs will be compared. Designs with both continuous and categorical factors, and with constraints on the experimental region will also be compared. |
Tom Donnelly JMP |
Breakout | Materials | 2017 |
|
Breakout Comparing M&S Output to Live Test Data: A Missile System Case Study (Abstract)
In the operational testing of DoD weapons systems, modeling and simulation (M&S) is often used to supplement live test data in order to support a more complete and rigorous evaluation. Before the output of the M&S is included in reports to decision makers, it must first be thoroughly verified and validated to show that it adequately represents the real world for the purposes of the intended use. Part of the validation process should include a statistical comparison of live data to M&S output. This presentation includes an example of one such validation analysis for a tactical missile system. In this case, the goal is to validate a lethality model that predicts the likelihood of destroying a particular enemy target. Using design of experiments, along with basic analysis techniques such as the Kolmogorov-Smirnov test and Poisson regression, we can explore differences between the M&S and live data across multiple operational conditions and quantify the associated uncertainties. |
Kelly Avery Reasearch Staff member IDA |
Breakout | Materials | 2018 |
|
Breakout Comparison of Methods for Testing Uniformity to Support the Validation of Simulation Models used for Live-Fire Testing (Abstract)
Goodness-of-fit (GOF) testing is used in many applications, including statistical hypothesis testing to determine if a set of data come from a hypothesized distribution. In addition, combined probability tests are extensively used in meta-analysis to combine results from several independent tests to asses an overall null hypothesis. This paper summarizes a study conducted to determine which GOF and/or combined probability test(s) can be used to determine if a set of data with relative small sample size comes from the standard uniform distribution, U(0,1). The power against different alternative hypothesis of several GOF tests and combined probability methods were examined. The GOF methods included: Anderson-Darling, Chi-Square, Kolmogorov-Smirnov, Cramér-Von Mises, Neyman-Barton, Dudewicz-van der Meulen, Sherman, Quesenberry-Miller, Frosini, and Hegazy-Green; while thecombined probability test methods included: Fisher’s Combined Probability Test, Mean Z, Mean P, Maximum P, Minimum P, Logit P, and Sum Z. While no one method was determined to provide the best power in all situations, several useful methods to support model validation were identified. |
Shannon Shelburne | Breakout |
![]() | 2019 |
|
Breakout Computing Statistical Tolerance Regions Using the R Package ‘tolerance’ (Abstract)
Statistical tolerance intervals of the form (1−α, P) provide bounds to capture at least a specified proportion P of the sampled population with a given confidence level 1−α. The quantity P is called the content of the tolerance interval and the confidence level 1−α reflects the sampling variability. Statistical tolerance intervals are ubiquitous in regulatory documents, especially regarding design verification and process validation. Examples of such regulations are those published by the Food and Drug Administration (FDA), the Environmental Protection Agency (EPA), the International Atomic Energy Agency (IAEA), and the standard 16269-6 of the International Organization for Standardization (ISO). Research and development in the area of statistical tolerance intervals has undoubtedly been guided by the needs and demands of industry experts. Some of the broad applications of tolerance intervals include their use in quality control of drug products, setting process validation acceptance criteria, establishing sample sizes for process validation, assessing biosimilarity, and establishing statistically-based design limits. While tolerance intervals are available for numerous parametric distributions, procedures are also available for regression models, mixed-effects models, and multivariate settings (i.e., tolerance regions). Alternatively, nonparametric procedures can be employed when assumptions of a particular parametric model are not met. Tools for computing such tolerance intervals and regions are a necessity for researchers and practitioners alike. This was the motivation for designing the R package ‘tolerance,’ which not only has the capability of computing a wide range of tolerance intervals and regions for both standard and non-standard settings, but also includes some supplementary visualization tools. This session will provide a high-level introduction to the ‘tolerance’ package and its many features. Relevant data examples will be integrated with the computing demonstration, and specifically designed to engage researchers and practitioners from industry and government. A recently-launched Shiny app corresponding to the package will also be highlighted. |
Derek Young Associate Professor of Statistics University of Kentucky ![]() (bio)
Derek Young received their PhD in Statistics from Penn State University in 2007, where his research focused on computational aspects of novel finite mixture models. He subsequently worked as a Senior Statistician for the Naval Nuclear Propulsion Program (Bettis Lab) for 3.5 years and then as a Research Mathematical Statistician for the US Census Bureau for 3 years. He then joined the faculty of the Department of Statistics at the University of Kentucky in the fall of 2014, where he is currently a tenured Associate Professor. While at the Bettis Lab, he engaged with engineers and nuclear regulators, often regarding the calculation of tolerance regions. While at the Census Bureau, he wrote several methodological and computational papers for applied survey data analysis, many as the sole author. Since being at the University of Kentucky, he has further progressed his research agenda in finite mixture modeling, zero-inflated modeling, and tolerance regions. He also has extensive teaching experience spanning numerous undergraduate and graduate Statistics courses, as well as professional development presentations in Statistics. |
Breakout | Session Recording |
![]() Recording | 2022 |
Webinar Connecting Software Reliability Growth Models to Software Defect Tracking (Abstract)
Co-Author: Melanie Luperon. Most software reliability growth models only track defect discovery. However, a practical concern is removal of high severity defects, yet defect removal is often assumed to occur instantaneously. More recently, several defect removal models have been formulated as differential equations in terms of the number of defects discovered but not yet resolved and the rate of resolution. The limitation of this approach is that it does not take into consideration data contained in a defect tracking database. This talk describes our recent efforts to analyze data from a NASA program. Two methods to model defect resolution are developed, namely (i) distributional and (ii) Markovian approaches. The distributional approach employs times between defect discovery and resolution to characterize the mean resolution time and derives a software defect resolution model from the corresponding software reliability growth model to track defect discovery. The Markovian approach develops a state model from the stages of the software defect lifecycle as well as a transition probability matrix and the distributions for each transition, providing a semi-Markov model. Both the distribution and Markovian approaches employ a censored estimation technique to identify the maximum likelihood estimates, in order to handle the case where some but not all of the defects discovered have been resolved. Furthermore, we apply a hypothesis test to determine if a first or second order Markov chain best characterizes the defect lifecycle. Our results indicate that a first order Markov chain was sufficient to describe the data considered and that the Markovian approach achieves modest improvements in predictive accuracy, suggesting that the simpler distributional approach may be sufficient to characterize the software defect resolution process during test. The practical inferences of such models include an estimate of the time required to discover and remove all defects. |
Lance Fiondella Associate Professor University of Massachusetts ![]() (bio)
Lance Fiondella is an associate professor of Electrical and Computer Engineering at the University of Massachusetts Dartmouth. He received his PhD (2012) in Computer Science and Engineering from the University of Connecticut. Dr. Fiondella’s papers have received eleven conference paper awards, including six with his students. His software and system reliability and security research has been funded by the DHS, NASA, Army Research Laboratory, Naval Air Warfare Center, and National Science Foundation, including a CAREER Award. |
Webinar | Session Recording |
![]() Recording | 2020 |
Keynote Consensus Building |
Antonio Possolo NIST Fellow, Chief Statistician National Institute of Standards and Technology. ![]() (bio)
Antonio Possolo holds a Ph.D. in statistics from Yale University, and has been practicing the statistical arts for more than 35 years, in industry (General Electric, Boeing), academia (Princeton University, University of Washington in Seattle, Classical University of Lisboa), and government. He is committed to the development and application of probabilistic and statistical methods that contribute to advances in science and technology, and in particular to measurement science. |
Keynote | Materials | 2018 |
|
Breakout Constructing Designs for Fault Location (Abstract)
Abstract. While fault testing a system with many factors each appearing at some number of levels, it may not be possible to test all combinations of factor levels. Most faults are caused by interactions of only a few factors, so testing interactions up to size t will often find all faults in the system without executing an exhaustive test suite. Call an assignment of levels to t of the factors a t-way interaction. A covering array is a collection of tests that ensures that every t-way interaction is covered by at least one test in the test suite. Locating arrays extend covering arrays with the additional feature that they not only indicate the presence of faults but locate the faulty interactions when there are no more than d faults in the system. If an array is (d, t)-locating, for every pair of sets of t-way interactions of size d, the interactions do not appear in exactly the same tests. This ensures that the faulty interactions can be differentiated from non-faulty interactions by the results of some test in which interactions from one set or the other but not both are tested. When the property holds for t-way interaction sets of size up to d, the notation (d, t ¯ ) is used. In addition to fault location, locating arrays have also been used to identify significant effects in screening experiments. Locating arrays are fairly new and few techniques have been explored for their construction. Most of the available work is limited to finding only one fault (d = 1). Known general methods require a covering array of strength t + d and produce many more tests than are needed. In this talk, we present Partitioned Search with Column Resampling (PSCR), a computational search algorithm to verify if an array is (d, t ¯ )-locating by partitioning the search space to decrease the number of comparisons. If a candidate array is not locating, random resampling is performed until a locating array is constructed or an iteration limit is reached. Algorithmic parameters determine which factor columns to resample and when to add additional tests to the candidate array. We use a 5 × 5 × 3 × 2 × 2 full factorial design to analyze the performance of the algorithmic parameters and provide guidance on how to tune parameters to prioritize speed, accuracy, or a combination of both. Last, we compare our results to the number of tests in locating arrays constructed for the factors and levels of real-world systems produced by other methods. |
Erin Lanus | Breakout |
![]() | 2019 |
Session Title | Speaker | Type | Recording | Materials | Year |
---|---|---|---|---|---|
Breakout Characterizing Human-Machine Teaming Metrics for Test and Evaluation |
Brian Vickers Research Staff Member Institute for Defense Analyses ![]() |
Breakout |
![]() | 2021 |
|
Keynote Closing Remarks |
Robert Behler Director DOT&E ![]() |
Keynote | 2018 |
||
Keynote Closing Remarks |
William “Allen” Kilgore Director, Research Directorate NASA Langley Research Center ![]() |
Keynote | Session Recording |
Recording | 2021 |
Closing Remarks |
Alyson Wilson NCSU ![]() |
2022 |
|||
Cloud Computing for Computational Fluid Dynamics (CFD) in T&E |
Neil Ashton WW Principal CFD Specialist Solution Architect, HPC Amazon Web Services ![]() |
Session Recording |
![]() Recording | 2022 |
|
Breakout Cognitive Work Analysis – From System Requirements to Validation and Verification |
Matthew Miller Exploration Research Engineer Jacobs/NASA Johnson Space Center ![]() |
Breakout |
![]() | 2021 |
|
Breakout Collaborative Human AI Red Teaming |
Galen Mullins Senior AI Researcher Johns Hopkins University Applied Physics Laboratory ![]() |
Breakout |
![]() | 2021 |
|
Breakout Combinational Testing |
Raghu Kacker NIST |
Breakout | Materials | 2017 |
|
Breakout Combinational Testing |
Rick Kuhn NIST |
Breakout | Materials | 2017 |
|
Short Course Combinatorial Interaction Testing |
Erin Lanus Research Assistant Professor Virginia Tech ![]() |
Short Course | Session Recording |
Materials
Recording | 2021 |
Breakout Combinatorial Testing for Link-16 Developmental Test and Evaluation |
Tim Mclean MCTSSA |
Breakout | Materials | 2017 |
|
Poster Combining data from scanners to inform cadet physical performance |
Nicholas Ashby Student United States Military Academy ![]() |
Poster | Session Recording |
![]() Recording | 2022 |
Breakout Combining Human Factors Data and Models of Human Performance |
Cynthia Null Technicial Fellow for Human Factors |
Breakout | Materials | 2018 |
|
Breakout Combining information for Realiability Assesment-Tuesday Morning |
Alyson Wilson North Carolina State University |
Breakout | Materials | 2016 |
|
Breakout Communicating Complex Statistical Methodologies to Leadership |
Jane Pinelis Johns Hopkins University Applied Physics Lab or JHU |
Breakout | Materials | 2017 |
|
Breakout Communicating Complex Statistical Methodologies to Leadership |
Paul Johnson MCOTEA |
Breakout | Materials | 2017 |
|
Breakout Communicating Statistical Concepts and Results: Lessons Learned from the US Service Academies |
Panel Discussion | Breakout | 2019 |
||
Breakout Communication in Statistics & the Five Hardest Concepts |
Jennifer Van-Mellekom Virginia Tech |
Breakout | 2017 |
||
Breakout Comparing Experimental Designs |
Tom Donnelly JMP |
Breakout | Materials | 2017 |
|
Breakout Comparing M&S Output to Live Test Data: A Missile System Case Study |
Kelly Avery Reasearch Staff member IDA |
Breakout | Materials | 2018 |
|
Breakout Comparison of Methods for Testing Uniformity to Support the Validation of Simulation Models used for Live-Fire Testing |
Shannon Shelburne | Breakout |
![]() | 2019 |
|
Breakout Computing Statistical Tolerance Regions Using the R Package ‘tolerance’ |
Derek Young Associate Professor of Statistics University of Kentucky ![]() |
Breakout | Session Recording |
![]() Recording | 2022 |
Webinar Connecting Software Reliability Growth Models to Software Defect Tracking |
Lance Fiondella Associate Professor University of Massachusetts ![]() |
Webinar | Session Recording |
![]() Recording | 2020 |
Keynote Consensus Building |
Antonio Possolo NIST Fellow, Chief Statistician National Institute of Standards and Technology. ![]() |
Keynote | Materials | 2018 |
|
Breakout Constructing Designs for Fault Location |
Erin Lanus | Breakout |
![]() | 2019 |