Going Bayesian

Posted 23 Sep 2019

You might have bad memories of statistics as a stodgy, boring subject you had to take at uni for a semester or two. but statistics is in constant evolution and there are many things we can do today that were simply not possible even five years ago. At Forethought we are always looking for new ways to give our clients information they can be confident of – Bayesian Analysis is one of these ways.

Bayesian techniques used to be limited to academics and researchers. Not anymore. At Forethought we are making the most of the new analytical tools that have become available in recent years. Significance testing, small sample studies, and multilevel models are just some of the areas where Bayesian methods have found a practical application at Forethought.

But while we look at new ways to give our clients better insights, we must remember that no technique should be applied blindly. Bayesian methods are inherently more complex, and they require us to take a more active role in the analysis process. There are also important philosophical differences between the Bayesian approach and the traditional one, differences that every practitioner should be aware of.

For a long time Bayesian methods used to be the preserve of a limited number of practitioners. There are many reasons why that was the case – they are slower than traditional approaches, more complex, and they require a little bit more thinking. Things are changing rapidly though, thanks to greater computing power and the increased availability of new analytical tools. These changes have not escaped us at Forethought and today we often resort to Bayesian methods in our work.

What is Bayesian?

Thomas Bayes was an English minister and mathematician who lived in the 18th century. Not much is known about his life, and his most important work (“An Essay Towards Solving a Problem in the Doctrine of Chances”) was published two years after his death, in 1763.

According to Bayes, the way we learn about the world is intrinsically probabilistic, getting closer and closer to the truth as we collect more facts. Prior knowledge from our past experiences, together with new evidence that we gather every day, form the basis of our decision-making process.

Consider, for instance, the case of Andrew. Andrew loves Japanese cuisine and he is a little bit resistant to try different cuisines. His partner though prefers French food, and finally manages to drag Andrew to a French restaurant. Because he can’t say no to his partner, Andrew gets to experience an increasing number of French restaurants. As he gathers more and more evidence, Andrew shifts from being a Japanese food evangelist to a French food advocate. That is, Andrew changes his mind because he gets to experience a new cuisine, compare it with his past knowledge, and conclude that he likes one better than the another.

Andrew’s travails in food are highly subjective. After all, and like the Romans used to say, tastes are not up for debate. But what happens when our protagonist is confronted with a more science-y conundrum? Let’s say that Andrew’s partner is a flat-Earth believer and presents him with a vast trove of papers that purportedly corroborate this view. Andrew thoroughly reads the new information and, taken in isolation, he finds the argument quite convincing. But Andrew also knows very well that there is very strong evidence, accumulated over centuries, that says that the Earth is round. As much as he loves his partner, Andrew’s beliefs don’t shift this time – the Earth is indeed round.

Andrew’s thought process is particularly interesting because it highlights the difference between the Bayesian and the traditional hypothesis-testing approaches. In the classic statistical approach, also known as frequentist, the probability of a hypothesis being true is assessed only against new evidence. This would be equivalent to Andrew’s partner erasing his memory, putting him under a glass dome, and asking him to judge the flat-Earth papers for what they are. In the Bayesian approach, we still assess the new evidence but we also consider our prior beliefs in order to reach a conclusion.

Put another way, if you flip a coin a Bayesian will tell you to go ahead and collect your data but also that – based on prior knowledge – you are most likely to have a 50/50 chance of getting heads or tails. A frequentist, on the other hand, will simply ask you to collect as much data as you can and that you might just as well end up with tails, heads, or a coin standing on its edge!

As you may have guessed, prior belief is a feature of the Bayesian approach and it is of particular importance when the new evidence that we have at our disposal is limited. This makes intuitive sense, but frequentists usually object that prior information can be highly subjective and not always grounded in science. A little bit like Andrew’s initial preference for Japanese food over French cuisine.

This is all very well and it makes for a rich philosophical debate between Bayesians and frequentists. During the 20th century it looked like the debate had been settled in favour of the frequentists, mostly because their methods are easier to implement. Bayesian techniques can be computationally very intensive, while the traditional statistical methods that we still use today are relatively simple since they were largely developed before the advent of computers. Things have changed though, and it’s time we reassess our analytical tools in line with what we can do today.

Why do We Care?

In most routine applications you won’t see that much of a difference between a frequentist approach and a Bayesian one, but at Forethought we don’t really do “routine” and for those cases, a Bayesian technique is not only the preferable option but the only viable one.

Significance testing is a thorny subject in statistics. There are plenty of papers out there that detail the limitations of the frequentist methods and the abuse and misuse of p-values. In the long term, it is Forethought’s aspiration to replace the traditional significance tests with fully probabilistic techniques.

In the meantime, we have implemented Bayesian methods in several scenarios. For instance, in its perfectly airtight world the frequentist approach usually assumes that the distribution of a variable follows a bell curve. It’s a nice assumption that, unfortunately, doesn’t always hold true. At Forethought we rarely observe such well-behaved variables and that is the case for many other fields as well – good luck finding a bell-shaped example in the stock market! If you cannot trust your frequentist model to follow all the assumptions that it is supposed to follow, then you should start using Bayesian techniques. But at Forethought we use Bayesian models for other reasons too.

Small sample studies are a case in point. Build a frequentist model with a handful of respondents and see how confident you are to report the results (you know that you shouldn’t be!). A Bayesian model is better equipped to deal with small samples, and that is because the frequentist approach was developed in a large sample framework, with the assumption that – if you can keep collecting data – you can obtain results that would otherwise be unavailable for samples of finite size. Of course, and as mentioned before, in a Bayesian context the estimates for small sample studies become quite sensitive to the specification of the prior belief, but we will return to this point later. First, let’s talk about our favourite application for Bayesian methods – multilevel models.

Bayesian Chart

In a regression model you have a dependent variable, you have a set of independent variables, and you build your model so that you can quantify how well your predictors can explain the outcome. We’ve been doing this, in one form or another, for more than two centuries; not that much time in the scheme of things, and in fact regression analysis is still a very active area of research. A multilevel model is an extension of the classic regression model, with the added benefit of being able to model clusters that are a natural part of the data. Take any dataset and you are most likely to find clusters in it – individuals, families, countries, basically any way you can find to categorise your observations. If you have clusters, then why not make the most of them and build models at both cluster level and overall level? You’ll get better estimates and predictions because your model is reflecting a truer picture of the data.

For example, suppose that you are building a model where the dependent variable is “trust in science” – if you do it in the ordinary way, you’ll get one set of results and that will be it. If you use a multilevel approach you will still get your overall picture but you’ll also get as many sets of results as the number of clusters you have, and you might even be able to isolate the flat-Earth believers and explain what’s wrong with them! Forethought is not new to multilevel modelling, and in fact we have been using Hierarchical Bayes techniques for many years for Prophecy Thoughts and Feelings®. But now, thanks to recent advancements in the practical implementation of these methods, we can use multilevel models in a much greater number of applications.

Frequentists will tell you that you can build these models in the traditional framework too, and that, in theory, is correct. What happens in practice is quite different, with the frequentist models often failing to converge because of their inherent complexity. When those models fail, the poor analyst is left to their own devices, with no clear methodology on how to resolve the issue. In most cases the only feasible solution is to prune (i.e. simplify) the model until it reaches convergence. Such practices are questionable, because this tinkering-until-it-fits approach is based not on a hypothesis but on the analyst simply wanting the model to work.

Those Pesky Priors…

The major point of contention between Bayesians and frequentists remains, as you may have guessed by now, the specification of the prior belief. The frequentists believe that having to stipulate in advance what we think about something, and before testing a hypothesis, fundamentally undermines the notion of scientific objectivity. While establishing a prior certainly requires more thinking on our part, is that really a bad thing? We can collect all the data in the world, but this is useless without context. In a Bayesian framework we are being asked, instead, to play an active role in the analysis process and its plausibility.

In practical terms, the priors we use at Forethought are so-called regularising priors. They are there just to make sure that the model doesn’t get too excited about the data – a gentle nudge in the right direction so as to achieve convergence. We can use these regularising priors because at Forethought we collect good quality, abundant data, and when you can play with lots of data even an informative prior is not going to change your results too much.

But, if you have a strong belief in something, a Bayesian model will offer you the chance to incorporate that in your model, something that you can’t do with a traditional method. Suppose, for instance, that you’ve run five waves of a study and that a driver of a certain outcome has consistently shown an impact of around 30%. You get your wave six, you build your model, and suddenly the impact has gone down to a mere 10%. This is what consultants would call an unintuitive result and, in the frequentist world, there is not much you can do about it. You are limited to analysing your evidence, and if the results don’t match your previous experience, you’re out of luck. But why act in a vacuum when you have solid and consistent beliefs accumulated over the preceding five waves? In the Bayesian framework you can build your model, incorporate your previously accumulated experience, and get more plausible results. And as you move on to waves seven, eight or nine of the study you will continuously update your priors and build models that better represent reality.

Bayesian Chart 2

You might have bad memories of statistics as a stodgy, boring subject you had to take at uni for a semester or two. But statistics is in constant evolution and there are many things we can do today that were simply not possible even five years ago. At Forethought we are always looking for new ways to give our clients information they can be confident of – Bayesian analysis is one of these ways.