Bayesian methods have long attracted the interest of statisticians but have only been infrequently used in statistical practice in most areas. This is due in part to the lack of accessible software. A recent paper said, “However, most of this work can be understood and used only by those with a high level of statistically sophistication, a fetish for archaic notation, and a desire for programming and debugging.”
The apparent subjective nature of the requirement for prior distributions has also been an issue.
More recently, computational methods and statistical software improvements have mitigated much of the first concern, and the development and use of “noninformative” objective priors and Bayes factors has addressed the second. Of course, in those cases where there is prior information, perhaps from prior studies, informative priors can be used.
The Bayes factor calculates the odds of null and alternative hypotheses or one model versus another based on the prior distribution and the data. It measures the change in the odds given in the prior to the posterior odds that is produced by the data. The Bayes factor is a measure of the strength of the evidence and is used in place of p values to reach a conclusion. A large Bayes factor says that the evidence favors or strongly favors the alternative hypothesis compared to the null, or of one model over the other. A BF of 10, for example, says that the model is 10 times more likely than the comparison model. Bayes factors can be used for any pair of models. A Bayes factor larger than 10 may be considered strong or very strong evidence for that model while very small values strongly favor the null, but there is no generally accepted standard.
In keeping with this trend, four Bayesian extension commands have been released for SPSS Statistics. They are STATS BAYES TTEST (Analyze > Compare Means > Bayesian T Test), STATS BAYES ANOVA (Analyze > General Linear Model > Bayesian Anova), STATS BAYES REGR (Analyze > Regression > Bayesian Regression), and STATS BAYES CROSSTABS (Analyze > Descriptive Statistics > Bayesian Crosstab). We will demonstrate the Bayesian t test and Bayesian Regression procedures in this post.
The T Test
T tests come in several flavors â€“ one sample, two sample paired, and two sample independent. The traditional t test and the Bayesian equivalent can handle all these cases, but we will look only at the two-sample independent case. Using the creditpromo.sav file shipped with Statistics, we test the dollars variable mean grouping by the insert variable. You can read about this example for the traditional analysis in the Case Studies available from the Help menu.
The traditional test output main table looks like this.
It shows a moderately significant difference in dollar spent with a t value of -2.26 and a significance level of .024.
Now let’s look at the Bayesian test. For this test, all three types of t test are handled in one dialog box. For the independent samples test, it would look like this.
The Group variable values will be determined from the data, so there must be only two distinct, nonmissing values. In Options, we have specified three different values for the prior scale parameter representing different standardized effect sizes.
Here is the table of Bayes factors.
This shows that for the medium effect size prior parameter (.7071), which is the default, there is very slight evidence in favor of the alternative hypothese of a nonzero difference, while with the other values, there is no such evidence and even a little evidence in favor of the null. The posterior effect size (table not shown), which is the standardized mean difference, is between -.361 and -.017 using the first prior value. If we calculate this using the traditional t test output, we get -.203.
We are left with, using the traditional t test, rejecting the null â€“ there is a difference, but the effect size is small, or we can report that the evidence in favor of the alternative is very weak â€“ just anecdotal, along with a range of effect sizes that includes the value from the traditional test.
The Regression Example
For this example we use the employee data.sav file shipped with Statistics and salary as the dependent variable. Change the measurement level of the educ variable to scale. Using the traditional linear regression procedure with educ and jobtime as the predictors, we get this output.
If we use the stepwise method, we get this.
Using Bayesian regression we have a choice of calculating the Bayes factor for all possible regressions, or for various subsets. Since we only have two predictors here, we choose all possible, but with many regressors this might be too many models. The dialog box and the Bayes factor output table look like this.
The model using only the educ variable is very strongly favored, which is consistent with the stepwise model, and it reports a posterior model probability of .837. The procedure also allows you to compare, i.e., compute Bayes factors, for each model compared to any other. We can also choose a single model for which the posterior distribution of the coefficients can be computed. Choosing model 2, we get
The estimated education effect is quite close to the traditional regression estimate of 3895.067, and the standard error is also quite close.
In summary, we have seen two of the four new Bayesian procedures and compared them to the output from the corresponding traditional procedures. There are additional features of these procedures that you can explore for yourself. These procedures help you to do statistical analysis without relying on traditional p value-based procedures and can be especially helpful for model selection. The dialog help for these extensions has some references to Bayesian methods that may help you get started.
You can download and install these procedures using Statistics version 22 or later from the Utilities menu. For earlier versions, you need to download them from the Extension Commands collection (https://www.ibm.com/developerworks/community/files/app?lang=en#/collection/23c2eac7-e524-4393-a4b9-0d224a2a0eda) and install from Utilities. At this writing, these procedures are not yet available on the GitHub Downloads feature of the new Predictive Analytics community, but they will be posted there. These procedures all require the R Essentials. When you install any of these procedures, the R BayesFactor package by Richard D. Morey and Jeffrey N. Rouder that is used by the procedures is also installed.
We would love to hear your thoughts about Bayesian methods and your experience with these procedures.