Exploratory Data Analysis (EDA) is an approach to learning about a data set. The goal is to examine and summarize the data in order to make sense out of the otherwise overwhelming mass of information. Data exploration and visualization provide tools for ensuring appropriate and accurate descriptions of the data.
EDA can help evaluators:
-Determine quality of data
-Find outliers and other errors or anomalies
-Identify possible analyses
-Discover potential patterns/relationships
-Identify influential variables
Exploring Data Properties
Before carrying out analyses, it is important for evaluators to understand the type and quality of data they are working with. Different types of data convey different information and often must be analyzed differently. Common data types and operational sources are displayed in the table below.
The type of data results from how the response variable was measured. You can read more about data types in relation to response variables here.
Characteristics of data determine how it can be modeled/displayed and tested. For example, you cannot compute a meaningful mean of categorical data. Instead, these data are commonly combined with discrete frequency data to compute proportions.
Sets of values can be ordered and described by their distribution.
Describe importance/implications of data distributions
Integrate interactive distribution visualizations with adjustable parameters
As with graphical techniques, numerical summaries are also used to describe the data and inform the analyst about their characteristics. These numerical summaries are called descriptive statistics.
Measures of spread: range, variance, IQR, median absolute deviation
Measures of central tendency: means, medians, trimmed means
Visualize Test Data
Before conducting sophisticated inferential statistical tests, it is best to simply look at the test data to begin to understand what the analyst has to work with and to indicate what advanced techniques might be appropriate. The graphical techniques demonstrated below describe important and common ways to look at test data as well as indicate what an analyst hopes to learn from doing so.
Conclusions from Exploratory Data Analysis
Exploratory Data Analysis (EDA) and Data Visualization are powerful tools and that can highlight problems to be addressed, lead to insights, and suggest patterns in the date. They do not, however, allow strong comparisons to be drawn or provide estimates of how confident the analyst can be in his or her conclusions. Here, we will review what you can learn from EDA and how this impacts options for further analysis.
- Violation of underlying Assumptions (distributional and otherwise)
- What you have to work with (e.g., range, missing data)
- Consequences and remedies
- Outliers – considerations (don’t get rid of them!)