Exploratory Data Analysis

Analysis Checklist Pre-Analysis Exploratory Data Analysis Inferential Analysis

[table id=12 /] Exploratory Data Analysis (EDA) is an approach to learning about a data set. The goal is to examine and summarize the data in order to make sense out of the otherwise overwhelming mass of information. Data exploration and visualization provide tools for ensuring appropriate and accurate descriptions of the data. EDA can help evaluators:

-Determine quality of data -Find outliers and other errors or anomalies -Identify possible analyses -Discover potential patterns/relationships -Identify influential variables

Exploring Data Properties

Data Type

Before carrying out analyses, it is important for evaluators to understand the type and quality of data they are working with. Different types of data convey different information and often must be analyzed differently. Common data types and operational sources are displayed in the table below.

The type of data results from how the response variable was measured. You can read more about data types in relation to response variables here. Characteristics of data determine how it can be modeled/displayed and tested. For example, you cannot compute a meaningful mean of categorical data. Instead, these data are commonly combined with discrete frequency data to compute proportions.

Distribution

Sets of values can be ordered and described by their distribution. Describe importance/implications of data distributions Integrate interactive distribution visualizations with adjustable parameters

Numerical Summaries

As with graphical techniques, numerical summaries are also used to describe the data and inform the analyst about their characteristics. These numerical summaries are called descriptive statistics.

Variance/Spread

Measures of spread: range, variance, IQR, median absolute deviation

Central Tendencies

Measures of central tendency: means, medians, trimmed means

Visualize Test Data

Before conducting sophisticated inferential statistical tests, it is best to simply look at the test data to begin to understand what the analyst has to work with and to indicate what advanced techniques might be appropriate. The graphical techniques demonstrated below describe important and common ways to look at test data as well as indicate what an analyst hopes to learn from doing so.

Conclusions from Exploratory Data Analysis

Exploratory Data Analysis (EDA) and Data Visualization are powerful tools and that can highlight problems to be addressed, lead to insights, and suggest patterns in the date. They do not, however, allow strong comparisons to be drawn or provide estimates of how confident the analyst can be in his or her conclusions. Here, we will review what you can learn from EDA and how this impacts options for further analysis.

Violation of underlying Assumptions (distributional and otherwise)
What you have to work with (e.g., range, missing data)
Consequences and remedies
Outliers – considerations (don’t get rid of them!)

Analyze » Exploratory Data Analysis

Analysis Checklist Pre-Analysis Exploratory Data Analysis Inferential Analysis

Exploring Data Properties

Data Type

Distribution

Numerical Summaries

Variance/Spread

Central Tendencies

Visualize Test Data

Conclusions from Exploratory Data Analysis

Leave a Reply

Subscribe

Analyze » Exploratory Data Analysis

Analysis Checklist Pre-Analysis Exploratory Data Analysis Inferential Analysis

Exploring Data Properties

Data Type

Distribution

Numerical Summaries

Variance/Spread

Central Tendencies

Visualize Test Data

Conclusions from Exploratory Data Analysis

Leave a Reply