
Speed and Technology in Data Analysis
Just because we can go faster does not mean we should.
The major source of speed for analysis has been automation. Some Marketing Scientists at Forethought can recall the grinding torment of waiting for their 50 megahertz computer in the mid-1990s to produce one regression model. In those days, they would often set up the analysis before they left for the day and sometimes it would still be running the next morning when they arrived at the office. Now, using a 3.4 gigahertz machine, the Marketing Science team run automated routines that produce hundreds and sometimes thousands of iterations of regression models literally in the blink of an eye. Based on quality parameters, even the selection of the final model has been semi-automated. Unquestionably, in the past decade there has been a quantum leap in speed and quality.
Meanwhile, the quality movement has been increasingly interested in developing standards for marketing and social researchers, culminating in 2006 with the international standard - ISO 20252. This standard is specific to the market research industry and covers all stages of a research study, including data analysis. One could argue that this standard has set some important speed limits on analysis by setting out some necessary processing steps.
The ISO standard specifically calls for the analyst to ‘have in place procedures to ensure the tabulations and other outputs have been checked.’ One of the main premises of the quality drive has been that quality should be built into the process and not merely inspected at the end. And yet, one of the greatest needs for painstaking care in data analysis is to check the output of the data collection at the end of the pilot or data collection, before analysis takes place.
War stories from the front line abound with tales of errors only found during file validation prior to analysis. So much so that Forethought would strongly contend that if the analyst does not dedicate a good amount of time to file validation prior to analysis, errors will inevitably make their way to the client and indeed, management’s decision making.
Some of the more common items that Forethought has observed with respect to file validation have been data export errors with respect to mislabelling of questions, excessive ‘Not Applicable’ answers to a question, programmers not strictly following the code frames specified in the questionnaire, sample skews, the presence of outliers in the data and data which is not approximating a normal distribution, therefore unsuitable for some modelling.
When Forethought first offered marketing research services, computer speed was as common a topic as internet speed is today. Some things have sped up, but some things like file validation should remain manual, as we make haste slowly.
