Pre-Analysis

Analysis Checklist Pre-Analysis Exploratory Data Analysis  Inferential Analysis


Header
Pre analysis sub head
3rd lvl header_adjusted
Checkbox_nobackgroundCreate DatasetCollect and organize test data into one location.- Data Entry
- Data Reduction
- Dataset Structure
- Missing Value Codes
- Dataset Merging
- Consistent Formatting
- File Types
- Quality check
Checkbox_nobackgroundDocument ProcessDescribe dataset in supplemental documentation.- Variable Descriptions
- Coding Guides
- Source Tracking – Collector Notes & Logs
Checkbox_nobackgroundClarify Analysis GoalsCommunicate test goals to analysts.- System & Test Design Review
- Common Understanding

Transitioning from test execution and data collection to data analysis is no trivial task.  Seemingly small errors in data entry or multiple methods of data reduction, for example, have the potential to render analyses inconclusive or lead to a anomalous result if they are not brought to the attention of the analysts.  Furthermore, it can pay to organize and document datasets in a format that can easily be shared and explained to others.  This section outlines steps to producing a usable dataset and highlights considerations that can save time and minimize frustrations throughout the analysis process.

Organizing & Storing Data

Organizing and Cleaning

The construction and content of datasets have practical implications.

-Structure & Format examples, recommended template

Datasets

-Commonly displayed as a data matrix.  Flat or wide structured datasets are n x p matrices with cases (e.g., trials, replicates) taking up n number of rows and factor levels and response variables composing p number of columns.

A second common organization scheme is the long, or stacked format.

Formatting is important because 1) some software default to a particular layout, 2) some analyses within software require a certain format, 3) some data are more intuitively processes in one format versus the other.

related issues: distinguish missing data from other 0 values

Relates to understanding the quality of dataset.

-actual data or missing data code?

-outliers are measured values or data entry errors? If measured, anomalous values and rightfully discarded, or legitimate and should be preserved?

-Missing data is non-response or error

Leave a Reply