Spacer
Spacer

Reliability, Estimates and Significance Testing

 

It’s a striking fact that the level of reliability commonly used by marketing researchers is the same as that used in human studies for drugs.  One would assume the need for reliable estimates in the testing of new drugs is far greater than in business research, yet both fields set the same level of reliability.

There has been some movement to relax the constraints of high reliability and low risk in significance testing for business research.  This is an attempt not only to enable reporting of smaller, more leading changes in the marketplace, but also to align research results with what management a) expect and b) believe.

Tradition

High levels of reliability imply that estimates are right most of the time and so can be relied upon.  For example, if estimates are 90% reliable, they are likely to be correct 90 times out of 100 and incorrect ten times out of 100.  That is, there is a 10% chance that the estimate is misleading.

Here we see the link between reliability and error.  The chance of error is the focus of significance testing, which is commonly conducted on sample estimates to determine whether they correctly represent the unknown population value they are estimating – that is, whether they are significant.

Associated with this testing is a level of significance (also called ‘p-value’), which represents the chance of making a Type I error.  A Type I error occurs when, for example, the sample analysis results in the conclusion that there has been a significant change in the average performance, when in fact there has not.

The chance of making a Type I error when conducting significance testing is inversely related to the level of reliability of the estimate; a 95% level of reliability or confidence implies a 5% chance of making a Type I error.

Current Focus

Management’s focus is on actionability.  They want research findings to act as early warning signals by identifying any changes in the direction of performance, so that action can be taken before a problem builds.  However, reducing the chance of making Type I errors in significance testing often results in small changes not showing as significant; conversely, accepting smaller changes as significant involves less reliability and hence greater risk than reporting only large changes as significant.

Statisticians would argue that the increased chance of making a Type I error is a major drawback, but management are increasingly prepared to accept this risk, since incorrectly concluding that there has been a significant decline in performance when there has not (Type I error) is less costly than standing by and doing nothing in the face of incremental declines that accumulate into difficult or expensive-to-reverse trends.

As Darren Stein, General Manager of Insights & Strategy, Consumer Marketing, Optus, has said: ‘we don’t need to be 95% sure things got worse before we take action.’

For significance tests on the variation of an organisation’s mean levels of performance from one year to the next, it has been proposed that levels of reliability can be advantageously reduced from the commonly used 95% to 90% or even less.  This involves increasing the risk of error from 5% to 10% or more, but management do not want their hands tied by having to reduce the risk of error to as low as the standard 5%.

This change in approach to significance testing is not cost-driven.  It is not a veiled attempt to reduce the costs of research through the smaller sample sizes that are required when lower levels of reliability are set.  The focus instead is on allowing management to be more responsive to smaller changes in company performance.

It also addresses the issue of reward that is contingent upon improvement in performance measures.  Management are keen for opportunities to provide a ‘pat on the back’ and encourage employees; however, this can be thwarted when few improvements in performance are reported as significant due to the constraint of achieving the standard levels of risk.

The key point here is flexibility: being prepared to accept lower levels of reliability (and higher risk) in situations where detecting and acting on small decreases in performance are more important than mistakenly acting on non-significant declines.

In line with this approach, Darren Stein has said that he would prefer that all estimates were reported with their associated p-value.  Recall, the p-value represents the risk of making a Type I error in relying on the estimate – that is, the risk of concluding that there has been a significant increase or decrease in performance when in fact there has not.  When p-values are reported, management can scan the estimates, view their associated risk and then make their own decision as to their proposed action.

 

Print this page


Copyright Forethought. A division of Roberts Research Pty Ltd. Evolution 7 - Web Design Melbourne