Analysis Checklist Pre-Analysis Exploratory Data Analysis Inferential Analysis
![]() |
||||
![]() |
||||
![]() |
||||
![]() | Create Dataset | Collect and organize test data into one location. | - Data Entry - Data Reduction - Dataset Structure - Missing Value Codes | - Dataset Merging - Consistent Formatting - File Types - Quality check |
![]() | Document Process | Describe dataset in supplemental documentation. | - Variable Descriptions - Coding Guides | - Source Tracking – Collector Notes & Logs |
![]() | Clarify Analysis Goals | Communicate test goals to analysts. | - System & Test Design Review - Common Understanding |
Transitioning from test execution and data collection to data analysis is no trivial task. Seemingly small errors in data entry or multiple methods of data reduction, for example, have the potential to render analyses inconclusive or lead to a anomalous result if they are not brought to the attention of the analysts. Furthermore, it can pay to organize and document datasets in a format that can easily be shared and explained to others. This section outlines steps to producing a usable dataset and highlights considerations that can save time and minimize frustrations throughout the analysis process.
Organizing & Storing Data
Organizing and Cleaning
The construction and content of datasets have practical implications.
-Structure & Format examples, recommended template
Datasets
-Commonly displayed as a data matrix. Flat or wide structured datasets are n x p matrices with cases (e.g., trials, replicates) taking up n number of rows and factor levels and response variables composing p number of columns.
A second common organization scheme is the long, or stacked format.
Formatting is important because 1) some software default to a particular layout, 2) some analyses within software require a certain format, 3) some data are more intuitively processes in one format versus the other.
related issues: distinguish missing data from other 0 values
Relates to understanding the quality of dataset.
-actual data or missing data code?
-outliers are measured values or data entry errors? If measured, anomalous values and rightfully discarded, or legitimate and should be preserved?
-Missing data is non-response or error