Spacer
Spacer

The Boxplot: a Useful Yet Simple Analytical Tool

Part I

Marketing researchers are an increasingly capable group. Perhaps in our rush to apply the sophisticated techniques we have forgotten some of the tried and true tools? Here is an application of a simple spreadsheet tool.

Your client is contemplating entering a new market and so has no historical data. You have been asked to establish what the preferred purchase cycle is for each segment of the market. Marketing research has provided a file comprising 280 respondents/companies who have been regularly purchasing the product from another source. In your data file there are three variables that can help with this question: Company Turnover, size of the business (small, medium, large), Days, the usual number of days between purchases and Usage, Annual Usage ($'000).

It is best to start by getting a good sense of the data. To do this we can look at a set of standard descriptive measures which help to provide an indication of the pattern in purchasing behaviour. However, often these are provided as a straight tabular list of 6-8 digit numbers making it difficult to draw any insights.

One Solution

The key to unlocking the value in this data is provided by the Boxplot. Not a new tool, just an overlooked one. The boxplot graphically and simply presents the salient descriptive features of a data set. It can be used either to describe a single variable in a data set or to compare two (or more) variables.

Boxplot One

 

Boxplot Legend

  • Right and left of box indicate third and first quartiles.
  • Length of box equals interquartile range (IQR), box itself represents middle 50% of observations.
  • Height of box has no significance.
  • Vertical line inside box indicates median.
  • Point inside box indicates mean.
  • Horizontal lines from each side of box extend to most extreme observations that are no farther than 1.5 IQRs from box. Useful for indicating variability and skewness.
  • Observations farther than 1.5 IQRs from box shown as individual points.
  • If between 1.5 IRQs and 3 IQRs from box, called mild outliers and hollow.
  • Otherwise, called extreme outliers and solid.

Simply by working through the legend to the boxplot, we can see in this example where Days for each of the business sizes has been plotted, that medium sized businesses have a slightly higher average number of days between purchases compared to large businesses and approximately double that of small businesses. This is just the tip of the iceberg. In next month’s edition, we will continue this comparative analysis using boxplots.

Part II

Continuing with our examination of the boxplot tool, let me firstly recap. Your client is contemplating entering a new market and so has no historical data. You have been asked to establish what the preferred purchase cycle is for each segment of the market. Marketing research has provided a file comprising 280 respondents/companies who have been regularly purchasing the product from another source. In your data file there are three variables that can help with this question: Company Turnover, size of the business (small, medium, large), Days, the usual number of days between purchases and Usage, Annual Usage ($'000).

Using a simple side by side boxplot then, we can continue our comparative analysis. In addition to the differences in averages discussed last month, it is important to also note that since there are some unusually large days (outliers), the average is slightly greater than the median indicating some skewness in the distribution. This in turn has resulted in the average being artificially high, albeit only slightly. The same can be said for large businesses. Looking at each business size individually, we can see that the length of the box which represents the spread of purchase days for the middle 50% of businesses surveyed, demonstrates that the small businesses have the most consistent purchase cycle with the least variability in number of days between purchases, with the large businesses having the most variability, only slightly ahead of the medium sized business. Such variability translates to difficulty in planning the appropriate stock levels and is therefore well worth monitoring/a valid point of comparison.

Also at a glance you are able to compare 1st, 2nd and 3rd quartiles for each business size (also known as 25th, 50th [median] and 75th percentiles respectively). For example. the usual number of days between purchases is less than 8.5 for 25% of small businesses surveyed. Or, for 75% of the medium sized business, the usual number of days between purchases is less than 24 days.

All of which help to provide an indication of the pattern in purchasing behaviour eg. the number of days between purchases that most companies have, the lowest and highest annual usage, the extent to which annual usage varies within each business size etc. Clearly, all very valuable information. If you were interested in comparing this information across the three different business sizes, as you should be, then there would be three of the above tables (one each for small, medium and large). Valuable information, but inaccessible.

Boxplot Two

With the aid of boxplots, a relative novice could continue this type of comparative discussion, full of pertinent, valuable information. Rediscover the beauty of the boxplot and take advantage of the knowledge it yields.

Print this page


Copyright Forethought. A division of Roberts Research Pty Ltd. Evolution 7 - Web Design Melbourne