Exploratory Data Analysis

Analysis Checklist Pre-Analysis Exploratory Data Analysis  Inferential Analysis

EDA sub head
3rd lvl header_adjusted
Checkbox_nobackgroundVisualize Raw DataPreview the data collected for individual variables to understand what there is to work with.- Scatterplots/Histograms/Graphs
- Distributions
- Outlier Check
- Univariate Assumptions
Checkbox_nobackgroundProcess & CleanDecide how to treat anomalies and recode data entries.- Outlier Processing
- Data Entry Errors
- Consistent Coding
Checkbox_nobackgroundSummarize DataCompute descriptive statistics and describe numerical properties of key variables.- Descriptive Statistics
- Variance
- Range
- Central Tendencies
- Missing Values
- Proportions
- Cross-tabs
Checkbox_nobackgroundExplore Multivariate DataFind possible relationships among variables and patterns across factors.- Multivariate Visualizations
- Factor Comparisons
- Influential Variables
- Multivariate Assumptions
Checkbox_nobackgroundReport EDA ConclusionsDocument processing and preliminary findings.- Data Nature & Limitations
- Analysis Candidates
- Suitability of Data to Test Goals

Exploratory Data Analysis (EDA) is an approach to learning about a data set.  The goal is to examine and summarize the data in order to make sense out of the otherwise overwhelming mass of information.  Data exploration and visualization provide tools for ensuring appropriate and accurate descriptions of the data.

EDA can help evaluators:

-Determine quality of data
-Find outliers and other errors or anomalies
-Identify possible analyses
-Discover potential patterns/relationships
-Identify influential variables

Exploring Data Properties

Data Type

Before carrying out analyses, it is important for evaluators to understand the type and quality of data they are working with.  Different types of data convey different information and often must be analyzed differently.  Common data types and operational sources are displayed in the table below.

The type of data results from how the response variable was measured. You can read more about data types in relation to response variables here.

Characteristics of data determine how it can be modeled/displayed and tested.  For example, you cannot compute a meaningful mean of categorical data.  Instead, these data are commonly combined with discrete frequency data to compute proportions.


Sets of values can be ordered and described by their distribution.

Describe importance/implications of data distributions

Integrate interactive distribution visualizations with adjustable parameters

Numerical Summaries

As with graphical techniques, numerical summaries are also used to describe the data and inform the analyst about their characteristics.  These numerical summaries are called descriptive statistics.


Measures of spread: range, variance, IQR, median absolute deviation

Central Tendencies

Measures of central tendency: means, medians, trimmed means

Visualize Test Data

Before conducting sophisticated inferential statistical tests, it is best to simply look at the test data to begin to understand what the analyst has to work with and to indicate what advanced techniques might be appropriate.  The graphical techniques demonstrated below describe important and common ways to look at test data as well as indicate what an analyst hopes to learn from doing so.

Conclusions from Exploratory Data Analysis

Exploratory Data Analysis (EDA) and Data Visualization are powerful tools and that can highlight problems to be addressed, lead to insights, and suggest patterns in the date.  They do not, however, allow strong comparisons to be drawn or provide estimates of how confident the analyst can be in his or her conclusions.  Here, we will review what you can learn from EDA and how this impacts options for further analysis.

  1. Violation of underlying Assumptions (distributional and otherwise)
  2. What you have to work with (e.g., range, missing data)
  3. Consequences and remedies
  4. Outliers – considerations (don’t get rid of them!)


Leave a Reply