Measuring Workload

Overview

Workload is a measure of the demands placed on the user. We recommend the AFFTC Revised Workload Estimate (ARWES/CSS) for most purposes due to its short length and intuitive scoring and interpretation. We also detail two versions of the NASA-Task Load Index (NASA-TLX). The NASA-TLX, raw scoring is the less complex version, but is not appropriate for all situations. The NASA-TLX, weighted scoring evaluates multiple workload dimensions in a granular manner, but is more challenging to administer.

Summary of Endorsed Scales

Scale Name	Acronym	Advantages	Disadvantages	Subscales	Number of Items
Air Force Flight Test Center Revised Workload Estimate Scale	ARWES/CSS	Very short	Small amount of comparison data available.	Overall	1
NASA Task Load Index, raw scoring	NASA-TLX	Widely given; Free app	Long. Complicated to administer and score.	Overall	6
NASA Task Load Index, weighted scoring	NASA-TLX	Widely given; Free app	Long. Very complicated to administer and score.	Overall	6

AFFTC Revised Workload Estimate (ARWES)

The ARWES is great for quickly quantifying mental workload with a single question. You may administer multiple ARWES scales corresponding to components of or tasks using a system under test.

Note: The first, unvalidated version of this scale, the Crew Status Survey (CSS), is sometimes confused for this validated, revised version. Item wording should precisely match those provided below.

Administration

Instruct users to read each statement carefully and indicate the one that is most representative of their workload during a specified task using a specified system.

Survey

Statement
1	Nothing to do; No system demands.
2	Light Activity; minimal demands.
3	Moderate activity; easily managed considerable spare time.
4	Busy; Challenging but manageable; Adequate time available.
5	Very busy; Demanding to manage; Barely enough time.
6	Extremely busy; Very difficult; Non-essential tasks postponed.
7	Overloaded; System unmanageable; Essential tasks undone; Unsafe.

Scoring

Because the ARWES is a one-item scale, there is no scoring necessary.

Interpretation

The interpretation is indicated by the statement that the user selects. For example, if the user selects statement 2, then you may interpret that the mental workload involved light activity with minimal demands. Evaluators may decide what level of workload is acceptable for given task/system under test.

Reference

Ames, L.L., & George, E.J. (1993). Revision and verification of a seven-point workload estimate scale. Edwards AFB, CA: Air Force Flight Test Center.

NASA Task Load Index, raw scoring (NASA-TLX)

The NASA-TLX is widely used but requires a complicated administration procedure and scoring method. Here, we describe a simpler scoring method called raw TLX scoring. However, this method does not consider the relative importance of different workload dimensions. The full (weighted) scoring method may be more appropriate in circumstances where some workload dimensions (physical, mental, etc.) have a significantly greater load than others.

Administration

The NASA-TLX measures workload with respect to a particular task performed by a particular user. The administrator must take care to define both the task and the user that they are investigating. The NASA-TLX cannot measure overall workload for a system or mission.

The administrator should use the administration manual (cited above) to explain the purpose of the study and the basics of the NASA-TLX method to the users before administering the scale. That is, proper administration of this measure requires pre-briefing respondents with the proscribed instructions.

Ideally, users will respond to the items while engaging in the task of interest. However, it is also acceptable to administer the scale immediately after completing the task.

Survey

Users require two separate forms. The first form is a table of definitions for their reference throughout the process (NASA-TLX Reference Sheet Definitions). The second form contains the actual survey items (NASA Task Load Index Rating Scales).

NASA-TLX Reference Sheet Definitions:

Factor	Endpoints	Description
MENTAL DEMAND	Low/High	How much mental and perceptual activity was required (e.g., thinking, deciding, calculating, remembering, looking, searching, etc.)? Was the task easy or demanding, simple or complex, exacting or forgiving?
PHYSICAL DEMAND	Low/High	How much physical activity was required (e.g., pushing, pulling, turning, controlling, activating, etc.)? Was the task easy or demanding, slow or brisk, slack or strenuous, restful or laborious?
TEMPORAL DEMAND	Low/High	How much time pressure did you feel due to the rate or pace at which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic?
PERFORMANCE	Good/Poor	How successful do you think you were in accomplishing the goals of the task set by the experimenter (or yourself)? How satisfied were you with your performance in accomplishing these goals?
EFFORT	Low/High	How hard did you have to work (mentally and physically) to accomplish your level of performance?
FRUSTRATION LEVEL	Low/High	How insecure, discouraged, irritated, stressed, and annoyed versus secure, gratified, content, relaxed, and complacent did you feel during the task?

NASA Task Load Index Rating Scales:

Scoring

Most forms do not include numeric anchors along the scale. Rather, they provide verbal anchors at either end of the rating scale. Tick marks range from 0 to 100 by 5-point increments for a total of 21 tick marks. Scale ratings are scored based on where the user marked the scale; scores are given for the tick at or immediately above the mark (i.e., round up).

For example, this response would be scored as “25.”

This response would be scored as “65.”

The scores from every item are summed and then the sum is divided by 6 to obtain an overall score between 0 - 100. This process can be expressed formulaically as:

Interpretation

Higher scores indicate higher workload. Normative information for NASA-TLX scores can be found in the following references:

Hertzum M. (2021). Reference values and subscale patterns for the task load index (TLX): A meta-analytic review. Ergonomics, 64(7), 869-878.

Grier, R. A. (2015). How High is High? A Meta-Analysis of NASA-TLX Global Workload Scores. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 59(1), 1727–1731.

References

Hart, S. G. & Staveland, L. E. (1988) Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P. A. Hancock and N. Meshkati (Eds.) Human Mental Workload. Amsterdam: North Holland Press.

Hart, S. G. (2006, October). NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 50, No. 9, pp. 904-908). Sage CA: Los Angeles, CA: Sage Publications.

NASA Task Load Index, weighted scoring (NASA-TLX)

Here, we detail the administration and scoring procedures for the full NASA-TLX. This method can be complicated to administer, and administering it incorrectly will likely result in uninterpretable data.

This method is best when you need detailed workload information about a task with demands distributed unequally across different workload dimensions. For example, a task with a high mental demand but no time limits (low temporal demand).

Administration

There is an app available for administration of the weighted NASA-TLX on the NASA website. This app is particularly useful for presentation of the weighting task (described below).

Ideally, users will respond to the items while engaging in the task of interest. However, it is also acceptable to administer the scale immediately after completing the task.

Survey

This survey has two major parts.

The first part asks respondents to make 15 pairwise comparisons about which workload factors were more significant in the task (NASA Task Load Index Pairwise Comparisons of Factors). Pairs of workload sources should be presented individually and in a random order.

The second part is the rating process described in the NASA Task Load Index, raw scoring (NASA-TLX section (above).

Respondents should also be given the NASA-TLX Reference Sheet Definitions for their reference throughout the process. This form is described above.

NASA Task Load Index Pairwise Comparisons of Factors:

Factor comparison
Effort	OR	Performance
Temporal Demand	OR	Frustration
Temporal Demand	OR	Effort
Physical Demand	OR	Frustration
Performance	OR	Frustration
Physical Demand	OR	Temporal Demand
Physical Demand	OR	Performance
Temporal Demand	OR	Mental Demand
Frustration	OR	Effort
Performance	OR	Mental Demand
Performance	OR	Temporal Demand
Mental Demand	OR	Effort
Mental Demand	OR	Physical Demand
Effort	OR	Physical Demand
Frustration	OR	Mental Demand

Scoring

Sources of load (Weights) scores for each of the 6 factors are computed from each respondent’s responses to the pairwise comparisons. A value of 1 is given for each selection from the paired-choice task. Values are then summed for each source. Resulting weights for each "source" can range from 0 - 5. Summed values (resulting weights) will be used to compute weighted average scores.

Raw magnitude of workload (Ratings) scores are calculated using the procedure described in the raw scoring (NASA-TLX section (above).

The overall weighted workload score for each respondent is computed by multiplying each rating (between 0 and 100) by the weight (between 0 and 5) given to the factor by that respondent. The sum of the weighting ratings for each task is then divided by 15 (the sum of the weights). This process can be expressed formulaically as:

Interpretation

Responses to the pairwise comparisons can be interpreted as falling between “not relevant” (score of 0) and “more important than any other factor” (score of 5).

Higher overall scores indicate higher workload.

Subscribe

Overview

Summary of Endorsed Scales

AFFTC Revised Workload Estimate (ARWES)

Administration

Survey

Scoring

Interpretation

Reference

NASA Task Load Index, raw scoring (NASA-TLX)

Administration

Survey

Scoring

Interpretation

References

NASA Task Load Index, weighted scoring (NASA-TLX)

Administration

Survey

Scoring

Interpretation

References