Overview

User trust is a measure of whether the user believes the system will act appropriately in a given situation, and can be useful for predicting how the user will interact (or not) with a system. The Test Science team at IDA has developed and is the process of validating one scale for measuring trust in automated systems, The Trust in Automated Systems Test (TOAST[RYS1] [GEA2] ). References for additional scales which are worth considering are also included.

Planned future updates to the Validated Scales Repository will expand this section.

Summary of Endorsed Scales

Scale NameAcronymAdvantagesDisadvantagesSubscalesNumber of Items
Trust in Automated Systems TestTOASTConstruct subscalesCurrently undergoing validationUnderstanding, Performance9

Trust in Automated Systems Test (TOAST)

The TOAST provides a quick sense of whether people dislike the system to the point that they stop using it and/or override its responses.

Reference

Wojton, H.M., Porter, D., Lane, S.T., Bieber, C., & Madhavan, P. (2020). Initial validation of the trust of automated systems test (TOAST). Journal of Social Psychology, 160(6), 735-750.

Administration

The TOAST may be administered at any time before, during, or after interacting with the system to measure user trust at that point in time. Instruct the respondent to read each statement carefully and indicate the extent to which you agree or disagree using the scale provided.

Survey

Subscale ItemStrongly Disagree                              Strongly Agree
U 1I understand what the system should do.             1          2          3          4          5         6         7
P 2The system helps me achieve my goals.             1          2          3          4          5         6         7
U 3I understand the limitations of the system.             1          2          3          4          5         6         7
U 4I understand the capabilities of the system.             1          2          3          4          5         6         7
P 5The system performs consistently.             1          2          3          4          5         6         7
P 6The system performs the way it should.             1          2          3          4          5         6         7
P 7I feel comfortable relying on the information provided by the system.             1          2          3          4          5         6         7
U 8I understand how the system executes tasks.             1          2          3          4          5         6         7
P 9I am rarely surprised by how the system responds.             1          2          3          4          5         6         7

Key: U = Understanding subscale. P = Performance subscale

Scoring

Each subscale is scored separately, resulting in two scores: one for Understanding and one for Performance. To calculate a subscale score, take the average of the responses to the items in that subscale. This process can be expressed formulaically as:

Interpretation

Higher Understanding Scores indicate that users trust the system more because they understand it. Higher Performance Scores indicate that the user trusts the system to help them perform their job duties.

Additional Trust Scales

Razin, Y and Feigh, K.M. (2023). Converging Measures and an Emergent Model: A Meta-Analysis of Human-Automation Trust Questionnaires. [Manuscript submitted for publication]. School of Aerospace Engineering. Georgia Institute of Technology.

Merritt, S.M. (2008). Affective Processes in Human-Automation Interactions. Human Factors, 53(4), 356 – 370.

Schaefer, K.E (2016). Measuring Trust in human-robot interactions: development of the trust perception scale. In Robust intelligence and trust in autonomous systems (pp. 191-218). Boston, MA: Springer US.

Mcknight, D. H., Carter, M., Thatcher, J. B., & Clay, P. F. (2011). Trust in a specific technology: An investigation of its components and measures. ACM Transactions on management information Systems (TMIS), 2(2), 1-25.


 [RYS1]Don’t say you are endorsing only one scale when you have a list of other endorsed scales below. That is a contradiction in terms

I do think we should endorse at least two other scale here and not only in the further reference section – especially ones that have more validation.

 [GEA2]Agree. Editted