Lost in Data Translation
September 09, 2013 | Jeff Fraser | Comments
In the Big Data era, small errors can lead to big problems
Back in his agency days, Kevin Keane was doing data analysis for an alcoholic beverage producer when he saw a major spike in historical sales data. The client couldn’t explain it. It was the perfect opportunity for Keane, a data junkie, to dig into the numbers and find an answer.
Keane, who today pilots a neuromarketing startup called BrainSights and co-chairs IAB Canada’s big data committee, says that analyzing data on drink sales might have told him the sales spike was caused by inspired advertising or a new seasonal consumer preference. That knowledge in hand, he and his team could recommend a marketing strategy to exploit the new relationship.
But it’s a good thing he didn’t, because his data was lying to him.
After a few hours of Googling, Keane’s team discovered sales were rising at the same time as liquor store workers in Ontario had threatened to go on strike; consumers were simply stocking up in case the stores closed. Smart as it is, big data couldn’t provide the context of the sales bump. If Keane had relied on it, he’d have arrived at a false insight, and potential blunder for the client. “The data is not going to provide the whole story,” he says. “That’s where you need human experience.”
With advances in digital technology and software, marketers can mine the big data mountain for insights about consumer behaviour that used to be too expensive, too time-consuming, or just plain impossible to reach. It’s just a matter of getting intrepid data nerds to analyze the data and spit out the results.
The promise of big data is that all your answers are sitting somewhere in the billions of data points you now have access to. But that promise comes with a corresponding danger: that your answer is buried in thousands of almost-answers and outright lies. Data science is subject to the butterfly effect; small gaps in data can lead to outsized problems for businesses, driving them to launch a marketing campaign that hurts the brand, or waste a chunk of the budget on a data project that leads nowhere.
Data that hasn’t been collected or analyzed properly can lead to a number of difficult-to-detect statistical limitations and errors, any one of which can cause damage. How much damage false insights can do depends on how well marketers understand how to use data, and what kind of organizational infrastructure they’ve built to identify problems and react to them. Not understanding data fundamentals puts marketers at the mercy of people who do—vendors, competitors or even people within their organizations who can use misleading data to make their initiatives seem more efficient or effective than they are.
Keane says clients, agencies and tech vendors he’s worked with often fall into the trap of seeing big data as marketing’s silver bullet and not recognizing its limitations. “You have to both interpret the data and understand its limitations,” he cautions. “You have to ask yourself, what does this answer? And what can’t this answer?”
For the marketer seeking big data insight, the most common gaff comes from what data experts call a spurious relationship, a false connection between variables and outcomes due to statistical noise or bad modeling. Spurious relationships usually occur when a key variable is missing (like the ominous threat of a strike involving booze sales), leading the analyst to overstate the influence of other variables.
Brent Dykes, Adobe Canada’s evangelist for customer analytics, has had many encounters with spurious relationships. A client working on a website redesign once asked Dykes to compare user behaviour before and after the new site launched. Being an experienced data jockey, Dykes immediately saw that the data looked funky and it would be wrong to overplay the role of a redesign in a surge in traffic. It turned out the client’s site launch coincided with a major industry conference, as well as a new version release of their product—both major influences on user behaviour.
Sources of data are always changing, says Dykes, and data collection methods that don’t change with them risk falling out of touch with reality. The fact that digital infrastructure—websites, applications, databases—is constantly being updated and reengineered creates another way to mess up the quality of data. Dykes offers a few more examples of websites whose traffic numbers started telling firms the wrong story about changes in the behaviour of their customers. One site’s apparent 50% drop in ad traffic was later linked to an IT update that changed the process for loading ads; another suffered a sharp 10% drop in pageviews which was later attributed to a redirection of mobile traffic.
Sample bias is another common error marketers should watch out for, says Brian Ross, president of LoyaltyOne’s shopper analytics arm Precima. In their enthusiasm for web analytics, marketers often forget that social media isn’t a great place to get a representative sample of customers. People who actively engage with brands on Twitter tend to not only be fans of those brands, but also young, better educated and well-off. If biases like this aren’t accounted for, they can affect every result and decision made down the road and lead to skewed results.
In the rush to incorporate online data into analysis, businesses have to be careful to look at the full life-cycle of a product, including customer behaviour both online and offline, warn both Ross and Dykes. While insights from online data are important, marketers sometimes overlook the complementary role of more traditional forms of offline data, like sales data and loyalty programs, adds Ross.
“Where you may get a thousand people responding to a survey to comment on price, if you have a million people who bought that product, you have a million data points to how they responded to that price. The marriage of both is powerful, not just one versus the other,” says Ross.
Avoiding big data errors is primarily an organizational challenge, rather than a technological one, says Dan Mallinger, chief scientist at data engineering firm Think Big Analytics. You can go fishing for a big data marketing home run but the reality is it pays to double check any data insight that seems unconventional and counterintuitive, he says. And if the insights aren’t robust in the face of more traditional market research like surveys and focus groups, chances are they aren’t strong enough to support strategic decisions.
Who’s vulnerable? Anyone who uses big data, it would appear. “Without a doubt, any company that has big data has spurious relationships in their models,” offers Mallinger, whose firm consults with businesses in retail, marketing, IT security and finance about how to evolve organizationally to make the most effective use of data.
A crucial challenge facing Think Big’s clients is how to foster a healthy relationship between two key camps: the data analysts who generate insights and the people who use them, such as executives, marketers and retailers (or “domain experts” in data jockey parlance). When domain experts and data analysts don’t have a close relationship, the experts have trouble understanding where data insights come from, Mallinger says. That may mean they will ignore those insights, or worse, treat them as unquestionable gospel, which can be damaging if the insight turns out to be misleading or false. Meanwhile, data scientists operating without input from domain experts are often ineffectual, driving data programs that are too broad, redundant, or lacking in clear objectives.
It’s a delicate relationship, because there can also be a danger if this relationship is too close, points out Ross at Precima.
“People who know how to use data can ultimately manipulate data to tell a story,” he explains. Domain experts that take ownership of analytics face a real risk of seeing their data through rose-coloured glasses, which means they might selectively frame data investigations to get the answers they want or see only evidence to support their current strategy. He says marketers need to understand and engage with big data, but there is some merit in having a centralized, neutral IT department implement it.
Ross also worries that data analysis can go awry if domain experts aren’t coming to it with clear objectives. This can lead to what’s called data dredging—endlessly searching for correlations within a dataset, rather than accepting a lack of evidence. Mining for answers without any specific question (or, conversely, asking question after question until you find an answer) makes it more likely you will generate an insight that’s based more on bias or random error than truth. As Ronald Coase, Nobel prizewinning economist at the University of Chicago, once wrote: “If you torture the data long enough, it will confess.”
Ross says data science, like any science, starts with the quality of its hypotheses, and to formulate high-quality hypotheses analysts need to draw on the experience of marketers, retailers, financiers and other experts in the field. Businesses have to approach big data with specific strategic questions if they want actionable insights, what he calls a “small data strategy.”
Mallinger says the best relationship between the data analysts who generate insights and the domain experts who use them to craft strategy involves a middle man. His most successful clients have created something he calls a data architecture group, which serves as a liaison between IT and other departments.
The group served as a stakeholder in decisions made by marketers, executives, and other departments, and relayed the needs of those departments back to IT in a pre-digested form, while at the same time advising the departments on IT’s capabilities and insights to help them make strategic use of them.
Educational and Cultural Error
Joseph Leon, chief digital officer at Vision7 International and all-around digital ad expert, says that for Canadian agencies and advertisers, being misled by data isn’t just an organizational problem—it’s a cultural one. He says agencies are enthusiastic about providing a scientific demonstration of marketing success, but their competencies in statistics and data analysis are severely underdeveloped. If marketers can’t understand basic concepts like sampling and standard deviation, it’s difficult for them to assess the validity of data insights, know their scope and limitations, and avoid the intrusion of subjective biases.
“The relentless drive to demonstrate ROI has led to this whole bunch of really spurious business cases, whether it’s publishers or platforms who are really intent on selling their product, or agencies and brands just trying to demonstrate success in their campaigns,” he says. “Hardly a week goes by where I don’t get a vendor proposal on my desk with a fundamentally flawed business case, claiming these blatantly suspicious ROI figures.”
One client’s previous agency had advised them to double their search marketing bids on the basis of cost-of-sale figures. But when Leon reviewed their analysis, he found the figures were based on a grand total of only two sales. One fewer sale would have meant the client met target; one additional sale would have made the bids three times as efficient. The tiny sample size created a massive margin of error, and yet it had been enough to alter bidding strategy.
If the marketers in charge of the account had a better understanding of statistics, this flawed business move could have been avoided, Leon says. But the problem goes beyond specific data errors. At an industry level, a lack of understanding of big data exposes agencies and brands to exploitation, by vendors or other agencies who use flawed data analysis to support ROI.
For instance, marketers are generally aware of the flaws in last-click attribution methodology, which attributes 100% of conversion influence to the last ad a consumer clicked on—but what they can’t do is assess the quality of competing attribution solutions. Ad tech vendors or publishers can overemphasize the conversion influence of advertising channels they offer, and provide lots of convincing data as evidence that marketers are unable to substantiate. One concrete example is cookie stuffing—flooding the web with third-party cookies to inflate how much engagement a user has had with an ad channel.
Leon says this exploitation isn’t always intentional because vendors can make their own mistakes with analysis as well. But he fears a lack of understanding of statistics could lead the ad industry to adopt flawed measurement methodologies, which would mean less efficiency and a lower return for everyone except those doing the measuring.
“In emerging fields of media—in this case big data, analytics, ad tech, etc.—there is a tendency for the leaders in that field to capitalize on the naivété of brands, and in some cases agencies as well. Sometimes if they don’t get it, and there’s lots of big numbers and tech-y stuff, it has a bedazzling effect. It’s almost as if the fact that they don’t understand it is reassuring.”
No Doomsday Scenarios
As advertising becomes more reliant on big data, there is more risk that it could get drawn into the kinds of volatility that has a habit of ripping through financial markets.
Scare stories, like a textbook about flies that was marked up to $23 million on Amazon by a runaway competitive pricing algorithm, or the trading algorithm that lost Knight Capital Group $440 million on a single day last August, make us wary about entrusting campaigns to data-driven optimization and programmatic buying platforms.
But Leon says we’re a long way from having to worry about doomsday scenarios. “Let’s not overstate the comparisons with the stock market, where you have full automation in terms of stock-trading, and machine-learning well beyond the majority of the media industry. The fact is that we do not rely on machines to predict and optimize performance on a daily basis, for all of our marketing.”
In marketing and retail, humans still make the big decisions about how to handle data insights, and that creates a stopgap between data error and business disaster. Careful, data-savvy marketers can still catch dishonest data before it does any real damage, and they can design strategy with built-in checks and balances to filter out uncertain data insights. The more effective marketers are at managing, analyzing and understanding data, the less likely they will be to succumb to data error.
For now, Leon says, the biggest challenge is the question of how to generate and use data insights most effectively. “Right now, the ability to mine significant amounts of data to glean meaningful insight is there, but we’re not doing it right yet. We need to focus on harnessing the insights that that can deliver to us.”