In order to evaluate the quality of human-system interaction, testers commonly need to measure usability, workload, training, and trust. As is the case for all measurement, testers should measure these concepts as precisely as possible, using validated scales to minimize measurement error. In the sections that follow, we identify validated scales designed to measure each of the concepts identified above and provide helpful information about their use, including:
- Name(s), including acronyms
- What it measures
- Information for creating your own survey forms including questions, anchors, and how to administer them
- Instructions on scoring. If there are multiple, valid ways to score then they are listed.
- Pseudocode (not specific to any computer language) to see how you would score scales in programs like Excel, SPSS, STATA, R, and Python.
If you have any questions, please contact the Test Science team, firstname.lastname@example.org for advice.
This provides an overview of the validated scales approved by DOT&E for use in operational test and evaluation.
Note: There are no scales that measure situational awareness in a valid and reliable way. Scales exist which measure perceived situational awareness and are briefly discussed as a final section. But while potentially valuable, these measures are not valid for evaluating a requirement to increase operator situational awareness. If testers need to measure real (as opposed to perceived) situational awareness, they should look into a behavioral measure.
|Measures||Links||Acronym||Scale Name||Advantages||Disadvantages||Subscales||Num Qs|
|Usability||S P||SUS||System Usability Scale||Widely given||Long. More complicated scoring||Overall||10|
|S P||UMUX||Usability Metric for User Experience||Shorter than SUS. Based on ISO9241 definition of usability.||Reverse-scored items can confuse people||Overall||4|
|S P||UMUX-LITE||Usability Metric for User Experience Lite||Short. Predicts SUS scores with high accuracy and correlates with NPS||Fewer outcome scores||Overall||2|
|Workload||S P I||NASA-TLX||NASA Task Load Index||Free app. Task agnostic||Long. Original scoring is complicated.||Overall||6|
|S P||ARWES/CSS||AFFTC Revised Workload Estimate Scale||Short (1 Q)||Small pool of data for comparison||Overall||1|
|Training Effectiveness||S||OATS||Operational Assessment of Training Scale||Construct subscales||Currently undergoing validation||Relevance||9|
|S||DSoT||Diagnostic Survey of Training||Helpful for improving training||Not validated. Only used as a supplement||Course||8|
|Trust||S P||TOAST||Trust of Automated Systems Test||Subscales||Currently undergoing validation||Understanding||4|
Key: I = Instruction manual. NPS = Net promoter score. P = Paper. S = Scale. * = Weights only need to be filled out once for each task type.
Information for administering each scale is included below. This includes the title, citation information, individual items, scoring criteria, and any other details.
For information about the importance of trust in automation see Lee & See (2004):
Lee, J.D., & See, K.A. (2004). Trust in Automation: Designing for Appropriate Reliance. Human Factors, 46(1), 50–80. doi: 10.1518/hfes.46.1.50_30392
As mentioned previously, we highly recommend measuring situational awareness (SA) using behavioral measures tied to mission-critical outcomes. Techniques to measure real SA typically do not involve scales, and so we do not include them in this repository. For an overview of these techniques, their benefits, and limitations (e.g., Situation Awareness Global Assessment Technique or SAGAT), please see this external repository: However, not all of these techniques are appropriate for all systems or tests, and details should be worked out at the program level.
In certain situations it may be important to measure perceived situational awareness. Perceived SA is a concept that can be measured with a scale. However, we do not include these measures here as in most cases this is not what testers desire, and efforts to validate commonly-used perceived SA scales have often found they measure other HSI concepts (e.g., workload).