Context is important. Consider the story of little Johnny, just returned from school with a test paper for his father to sign.
“Hi Dad, could you please sign this for me,” Johnny said uncertainly, handing the paper to his father.
“Mm, a 75 huh,” Dad said thoughtfully pursing his lips, “what was the class average?”
“75,” said Johnny slowly.
“Oh, not bad then,” Dad smiled.
Suddenly feeling the need for full disclosure, Johnny continued, “We took the test on the day of the class trip to the aviary that I didn’t go on due to my irrational fear of being carried away by a giant eagle. Only three of us didn’t go – the teacher left a test for us to do so we wouldn’t be bored.”
Dad raised an eyebrow. “I see,” he said, “how did the other two do?”
“Well, there was a new kid who didn’t go because he didn’t have a signed consent form. He got a 50, but then he’d never studied fractions before…”
“Oh,” said Dad, quickly computing in his head, “so the other must’ve gotten a 100?”
“Yeah,” said Johnny, “but that was Joey DaBrain – he always gets a 100!”
“Hmm,” said Dad as he handed back the signed paper and slowly walked away, puzzling over whether Johnny really had performed acceptably on this test or whether it really even mattered.
While this scenario may seem slightly bizarre, it highlights the importance of context in evaluating performance, and reflects on some troubling issues plaguing the communications research industry in the provision of benchmarks to put research results in proper context for making crucial decisions for a brand’s communication strategy.
We may be comforted to know that our new advertisement is performing “at norm” when tested, but how can we be confident that a benchmark provided by our research supplier is truly relevant for our brand? Unfortunately it may not be, since traditional norms – typically category or industry averages – are affected by a variety of issues which may render them inappropriate or even misleading.
Representation: A benchmark can only be as strong as its representation. The average of the three students who took the math test is not likely to be representative of the average ability of all 25 students in the class. A norm computed for a particular category is likely to be composed of whatever the research supplier has at hand, rather than a representation of the category as a whole. And composition can clearly make a difference in practice.
Take for example average MSW●ARS CCPersuasion scores for four major brands in the same household products category, as shown in the following chart. The average scores between the brands vary considerably. The average across all four brands is 7.7. However as the graphic illustrates, excluding either the strongest scoring or weakest scoring brand can dramatically affect the overall average. The answer to ‘how are we doing’ for Brand A would be considerably different based on the presence or absence of brands B or D in the normative computations.
Brand Development: We likely wouldn’t hold Johnny to the same standard as Joey DaBrain when it comes to results on a math test. Children have unique strengths and should be treated accordingly. The same is true for brands. Even if a benchmark were to account for all brands in a given category, it is not a given that this benchmark would then be appropriate for application to research results for all brands in the category.
This is because brands have unique situations that should be accounted for in assessing effectiveness of commercial communications. A brand team with a new entry to the category shouldn’t necessarily be held to the same standard as the category leader. As an example, the variation in purchase intent levels for seven different brands in a personal care category shows how a one-size-fits-all approach to normative data will give a misleading result for many brands.
Consistency: We may attempt to expand the context for understanding Johnny’s test result by considering the class next door, which recently had a test on fractions. However the results for the other class are obviously affected by the teacher’s choice of questions and typical difficultly level of his or her tests. When it comes to normative data, similar considerations are also in play. If we are considering a verbal metric, results could potentially be affected by such factors as question wording, type of scale used, placement within the questionnaire and sample group considered.
Even for a behavioral metric with a consistent and rigorously monitored methodology such as CCPersuasion, there can be differences in how brands in the same category define their competitive brand set, particularly in categories that are somewhat ambiguous or can be defined more broadly or narrowly. Such differences may again make comparisons to category averages less meaningful than first presumed.
Availability: In some cases, particularly in new or emerging categories, it may be difficult or impossible to formulate normative data for a specific category or even broader industry segment. Or for smaller categories, it may be necessary to reach far back in time to assemble sufficient cases, leaving the resulting norms susceptible to changes in market conditions, consumer sentiment or research methods.
Scope: A category norm requires historical test results for the metric of interest across a reasonably robust number of brands and overall cases. So by definition, such a metric will need to be general enough to be in broad use in the research industry. This will include common metrics such as liking, purchase intent, awareness and so forth. While benchmarks may be readily available for these metrics, this likely will not be the case for many of the brand-specific metrics that the brand team is particularly interested in, which leads to the last and perhaps most important issue with normative data – meaningfulness.
Meaningfulness: Beyond the appropriateness of the “class average” of 75 as a benchmark for Johnny’s performance on the math test, perhaps the larger issue was whether the test result was at all meaningful in predicting Johnny’s success in the course, given that it was likely a make-work exercise for the students not participating in the class trip. Similarly, while much effort may go into providing normative benchmarks for a battery of standard metrics, are the resulting comparisons useful to the brand?
Generally, a given metric may be considered useful in the assessment of commercial communication if it is either predictive enough of in-market effectiveness (typically sales-response) to be useful as an overall success criterion, or is specifically related to the brand or category in such a way as to guide revisions or future developmental work. Unfortunately, the metrics for which normative data is typically available, such as liking and recall/awareness, are too general to provide specific guidance to the brand, and they have been shown not to have a strong enough relationship to in-market effectiveness to be appropriate as a success criterion, as for example in matched-market advertising weight and copy tests:
Despite these issues, research managers desire context for their research results – and rightly so, as context is imperative. Part II of this series will highlight approaches pioneered by MSW●ARS that provide appropriate context for research results while avoiding the pitfalls which beset standard normative data.
Please contact your MSW●ARS representative to learn more.