Bayesian analysis: an introduction
A little bit of history…
The probability theory that is the foundation of Bayesian statistics was developed by Thomas Bayes (1702-1761). His ideas were accepted by some and challenged by others, and it was not until the mid-20th century that these ideas started gaining popularity. However, Bayesian methods were not widely implemented until about 1990, in part due to the complexity of the mathematical calculations. As computer power increased and new computing algorithms were developed, Bayesian statistics became more accessible; the models were not limited to simple models as in years past. Nowadays, we can fit complicated Bayesian models to almost every situation.
Unlike in frequentist or classical statistics, in Bayesian analysis we treat the data as non-random observed quantities and the parameters as random. In classical statistics, the parameter is fixed, the data are a random sample, and we compute point estimates and confidence intervals for the parameters. On the other hand, in Bayesian statistics we compute a posterior density distribution, which represents the uncertainty in that parameter; i.e., what values the parameter is likely to take on. This probability distribution relies not only on the observed data, but also on the prior information at hand. This prior knowledge can be derived from previous data, similar systems, expert knowledge/opinion, or “gut instinct.” Therefore, the posterior distribution characterizes what we know and incorporates new evidence about the parameter after the data are observed.
Bayesian intervals: incorporating prior information
Many techniques used to analyze test results come from what is known as a “frequentist” framework, in which terms such as “hypothesis testing,” “confidence intervals,” and “p-values” appear often. The frequentist framework is built around the concept of long-run probability. That is, probability is conceptualized as the proportion of times an event will occur over a long sequence of events.
Increasingly, a Bayesian approach to modeling is being considered as a powerful alternative to this frequentist framework. For an overview of Bayesian analysis, see our introductory material. The Bayesian framework is characterized by many strengths, including the ability to incorporate information from across multiple samples or from previously existing information. This may supplement limited data, which, in turn, allows for increased precision and reduced costs.
Specifically, the Bayesian framework allows for the introduction of a “prior” parameter that uses information from prior studies or from prior knowledge. The three steps of the process we will illustrate are:
- Construct a prior from previously existing information (e.g., old test data from a similar system)
- Construct the likelihood from your collected test data (e.g., your sample data)
- Estimate posterior distribution using Bayes’ theorem (put all of the pieces together)
An example containing notional data helps to clarify these steps. First, we will consider how a frequentist approach would proceed. Then, we will compare this approach to the Bayesian approach. We will compare the interpretations of each approach.