Overview
Workload is a measure of the demands placed on the user. We recommend the AFFTC Revised Workload Estimate (ARWES/CSS) for most purposes due to its short length and intuitive scoring and interpretation. We also detail two versions of the NASA-Task Load Index (NASA-TLX). The NASA-TLX, raw scoring is the less complex version, but is not appropriate for all situations. The NASA-TLX, weighted scoring evaluates multiple workload dimensions in a granular manner, but is more challenging to administer.
Summary of Endorsed Scales
Scale Name | Acronym | Advantages | Disadvantages | Subscales | Number of Items | |
Air Force Flight Test Center Revised Workload Estimate Scale | ARWES/CSS | Very short | Small amount of comparison data available. | Overall | 4 | |
NASA Task Load Index, raw scoring | NASA-TLX | Widely given; Free app | Long. Complicated to administer and score. | Overall | 6 | |
NASA Task Load Index, weighted scoring | NASA-TLX | Widely given; Free app | Long. Very complicated to administer and score. | Overall | 6 |
AFFTC Revised Workload Estimate (ARWES)
The ARWES is great for quickly quantifying mental workload with a single question. You may administer multiple ARWES scales corresponding to components of or tasks using a system under test.
Note: The first, unvalidated version of this scale, the Crew Status Survey (CSS), is sometimes confused for this validated, revised version. Item wording should precisely match those provided below.
Reference
Ames, L.L., & George, E.J. (1993). Revision and verification of a seven-point workload estimate scale. Edwards AFB, CA: Air Force Flight Test Center.
Administration
Instruct users to read each statement carefully and indicate the one that is most representative of their workload during a specified task using a specified system.
Survey
Statement | |
1 | Nothing to do; No system demands. |
2 | Light Activity; minimal demands. |
3 | Moderate activity; easily managed considerable spare time. |
4 | Busy; Challenging but manageable; Adequate time available. |
5 | Very busy; Demanding to manage; Barely enough time. |
6 | Extremely busy; Very difficult; Non-essential tasks postponed. |
7 | Overloaded; System unmanageable; Essential tasks undone; Unsafe. |
Scoring
Because the ARWES is a one-item scale, there is no scoring necessary.
Interpretation
The interpretation is indicated by the statement that the user selects. For example, if the user selects statement 2, then you may interpret that the mental workload involved light activity with minimal demands. Evaluators may decide what level of workload is acceptable for given task/system under test.
NASA Task Load Index, raw scoring (NASA-TLX)
The NASA-TLX is widely used but requires a complicated administration procedure and scoring method. Here, we describe a simpler scoring method called raw TLX scoring. However, this method does not consider the relative importance of different workload dimensions. The full (weighted) scoring method may be more appropriate in circumstances where some workload dimensions (physical, mental, etc.) have a significantly greater load than others.
References
Hart, S. G. & Staveland, L. E. (1988) Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P. A. Hancock and N. Meshkati (Eds.) Human Mental Workload. Amsterdam: North Holland Press.
Hart, S. G. (2006, October). NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 50, No. 9, pp. 904-908). Sage CA: Los Angeles, CA: Sage Publications.
Administration
The NASA-TLX measures workload with respect to a particular task performed by a particular user. The administrator must take care to define both the task and the user that they are investigating. The NASA-TLX cannot measure overall workload for a system or mission.
The administrator should use the administration manual (cited above) to explain the purpose of the study and the basics of the NASA-TLX method to the users before administering the scale. That is, proper administration of this measure requires pre-briefing respondents with the proscribed instructions.
Ideally, users will respond to the items while engaging in the task of interest. However, it is also acceptable to administer the scale immediately after completing the task.
Survey
Users require two separate forms. The first form is a table of definitions for their reference throughout the process (NASA-TLX Reference Sheet Definitions). The second form contains the actual survey items (NASA Task Load Index Rating Scales).
NASA-TLX Reference Sheet Definitions:
Factor | Endpoints | Description |
MENTAL DEMAND | Low/High | How much mental and perceptual activity was required (e.g., thinking, deciding, calculating, remembering, looking, searching, etc.)? Was the task easy or demanding, simple or complex, exacting or forgiving? |
PHYSICAL DEMAND | Low/High | How much physical activity was required (e.g., pushing, pulling, turning, controlling, activating, etc.)? Was the task easy or demanding, slow or brisk, slack or strenuous, restful or laborious? |
TEMPORAL DEMAND | Low/High | How much time pressure did you feel due to the rate or pace at which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic? |
PERFORMANCE | Good/Poor | How successful do you think you were in accomplishing the goals of the task set by the experimenter (or yourself)? How satisfied were you with your performance in accomplishing these goals? |
EFFORT | Low/High | How hard did you have to work (mentally and physically) to accomplish your level of performance? |
FRUSTRATION LEVEL | Low/High | How insecure, discouraged, irritated, stressed, and annoyed versus secure, gratified, content, relaxed, and complacent did you feel during the task? |
NASA Task Load Index Rating Scales:

Scoring
Most forms do not include numeric anchors along the scale. Rather, they provide verbal anchors at either end of the rating scale. Tick marks range from 0 to 100 by 5-point increments for a total of 21 tick marks. Scale ratings are scored based on where the user marked the scale; scores are given for the tick at or immediately above the mark (i.e., round up).
For example, this response would be scored as “25.”

This response would be scored as “65.”

The scores from every item are summed and then the sum is divided by 6 to obtain an overall score between 0 - 100. This process can be expressed formulaically as:

Interpretation
Higher scores indicate higher workload. Normative information for NASA-TLX scores can be found in the following references:
Hertzum M. (2021). Reference values and subscale patterns for the task load index (TLX): A meta-analytic review. Ergonomics, 64(7), 869-878.
Grier, R. A. (2015). How High is High? A Meta-Analysis of NASA-TLX Global Workload Scores. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 59(1), 1727–1731.
NASA Task Load Index, weighted scoring (NASA-TLX)
Here, we detail the administration and scoring procedures for the full NASA-TLX. This method can be complicated to administer, and administering it incorrectly will likely result in uninterpretable data.
This method is best when you need detailed workload information about a task with demands distributed unequally across different workload dimensions. For example, a task with a high mental demand but no time limits (low temporal demand).
References
Hart, S. G. & Staveland, L. E. (1988) Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P. A. Hancock and N. Meshkati (Eds.) Human Mental Workload. Amsterdam: North Holland Press.
Hart, S. G. (2006, October). NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 50, No. 9, pp. 904-908). Sage CA: Los Angeles, CA: Sage Publications.
Administration
There is an app available for administration of the weighted NASA-TLX on the NASA website. This app is particularly useful for presentation of the weighting task (described below).
The NASA-TLX measures workload with respect to a particular task performed by a particular user. The administrator must take care to define both the task and the user that they are investigating. The NASA-TLX cannot measure overall workload for a system or mission.
The administrator should use the administration manual (cited above) to explain the purpose of the study and the basics of the NASA-TLX method to the users before administering the scale. That is, proper administration of this measure requires pre-briefing respondents with the proscribed instructions.
Ideally, users will respond to the items while engaging in the task of interest. However, it is also acceptable to administer the scale immediately after completing the task.
Survey
This survey has two major parts.
The first part asks respondents to make 15 pairwise comparisons about which workload factors were more significant in the task (NASA Task Load Index Pairwise Comparisons of Factors). Pairs of workload sources should be presented individually and in a random order.
The second part is the rating process described in the NASA Task Load Index, raw scoring (NASA-TLX section (above).
Respondents should also be given the NASA-TLX Reference Sheet Definitions for their reference throughout the process. This form is described above.
NASA Task Load Index Pairwise Comparisons of Factors:
Factor comparison | ||
Effort | OR | Performance |
Temporal Demand | OR | Frustration |
Temporal Demand | OR | Effort |
Physical Demand | OR | Frustration |
Performance | OR | Frustration |
Physical Demand | OR | Temporal Demand |
Physical Demand | OR | Performance |
Temporal Demand | OR | Mental Demand |
Frustration | OR | Effort |
Performance | OR | Mental Demand |
Performance | OR | Temporal Demand |
Mental Demand | OR | Effort |
Mental Demand | OR | Physical Demand |
Effort | OR | Physical Demand |
Frustration | OR | Mental Demand |
Scoring
Sources of load (Weights) scores for each of the 6 factors are computed from each respondent’s responses to the pairwise comparisons. A value of 1 is given for each selection from the paired-choice task. Values are then summed for each source. Resulting weights for each "source" can range from 0 - 5. Summed values (resulting weights) will be used to compute weighted average scores.
Raw magnitude of workload (Ratings) scores are calculated using the procedure described in the raw scoring (NASA-TLX section (above).
The overall weighted workload score for each respondent is computed by multiplying each rating (between 0 and 100) by the weight (between 0 and 5) given to the factor by that respondent. The sum of the weighting ratings for each task is then divided by 15 (the sum of the weights). This process can be expressed formulaically as:

Interpretation
Responses to the pairwise comparisons can be interpreted as falling between “not relevant” (score of 0) and “more important than any other factor” (score of 5).
Higher overall scores indicate higher workload.