
Representativeness and Online Panels
By all accounts, the degree of global substitution of computer-aided telephone interviewing (CATI) by online data collection has been breathtaking. Yet today, amongst the many happy online data collection stories of cost and time reductions, troubling accounts of unexpected biases are also emerging.
For example:
Generally, online vendors have been swift in implementing solutions as panel related challenges have been identified. Increasingly, however, examples are emerging that document far more difficult issues to address; issues concerning the representativeness of the panel - not in terms of the readily identifiable demographics, but in terms of attitudes and behaviour. These challenges are far more difficult to deal with at the front-end of data collection and have resulted in entire studies being discarded.
Take the example of a recent Australian online study designed to identify the drivers of market share in a technology related consumer services market. The use of panel data was trialled after numerous waves of CATI based research. The study was based on approximately n = 2,000 interviews, conducted quarterly, using simple random sampling.
Previously using CATI, the study had a correlation coefficient over two years of 0.76 between predicted and actual market share (lagged by one quarter). The online panel results however were counter-intuitive, showing a sizeable deviation from the previous findings (refer to the chart, which shows the mean ratings scored across each of five predictors of market share). This deviation also directly contradicted client-based secondary data. This was despite extensive front-end work matching the panel characteristics to the Australian Bureau of Statistics demographic profile (in terms of age, gender and geographic region) to ensure the online sample was representative of the Australian population. Whilst a general ‘scale usage’ change was expected, the ‘final nail in the coffin’ of the validity of the research came straight from left field – this major deviation only affected one supplier!
At considerable expense, it was decided that the research should be repeated using two additional separate sources of data collection; the traditional CATI-based approach using an alternative CATI supplier and another online study using an alternative panel. Analysis of that data revealed that the CATI study was consistent with previous CATI studies along with validated secondary data and, once again, the online study produced spurious results.
So what went wrong with the panel data? Beyond the demographic markers used to validate the representativeness of the panel, were psychographic characteristics in terms of shared attitudes, interests and opinions. Painstaking analysis revealed the vast over-representation of one psychographic segment - characterised by a very negative attitude to one specific supplier.
Note: Significance testing was conducted between each wave at the 95% level of confidence. Broken lines indicate a significant difference.
As researchers, the most worrying question is, how can we identify ahead of time, biases in online panels that did not appear using the traditional, ubiquitous home telephone number sample? It would seem that ensuring a representative demographic profile is sometimes not always sufficient.
Perhaps the representativeness of the sample needs to be assessed not only on the basis of the usual demographics, but also on the psychographic profile of the panel respondents. I am not sure how practical that requirement is. Presumably, the psychographic representativeness of the panel is project specific making the cost of profiling a panel on the basis of psychographic variables most likely prohibitive - especially if one reason for using an online panel in the first place was to reduce costs relative to CATI. Then, there is also the thorny question of the preparedness of a research buyer to reveal to a third-party panel operator the psychographic segmentation variables being used in a proprietary segmentation scheme.
