What You Need to Know About Big Data: the Heavy, the Messy, and the Misleading

Back
Piet Hein van Dam
Feb 06 2015
Digital Media
What You Need to Know About Big Data: the Heavy, the Messy, and the Misleading
Share this article

TV ads

In 1941 the first TV ad ever was broadcasted. Check it out on YouTube.  A 9 seconds still of a Bulova clock with an impressive voice-over: “America runs on Bulova time”. It is a nice example of what happens when you have one medium – print magazines – and another one comes in – TV. The first primitive TV ads are just a reproduction of the existing print ad. Plus the new feature of TV added: voice. Voice + print ad = new TV add. It costed $9 – for 9 seconds. 

Online ads

Recently, the 2015 Super Bowl ads were shown at a price of about $4.5 mio for a 30-seconds spot. A quick and dirty calculation would say that from 1941 to 2015 the value of TV advertising has increased from $1/sec to $150.000/sec. Wow!
It has grown a tremendous 150.000 times in 74 years time.  That growth went not by itself. Obviously, it was a huge industry effort, by advertisers, agencies, networks and researchers, to make sure that every spot is delivered to the right target group, that every second of exposure counts and has impact.

In October 1994 the first ever online banner ad was launched to the readers of Hotwired.com. 44% of those who saw the ad actually clicked. It was sponsored by AT&T and 44% of those who saw the ad actually clicked.

Macintosh HD:Users:pietheinvandam:Desktop:Schermafbeelding 2015-02-03 om 12.36.07.png

Today we are 20 years further down the road. We may ask ourselves: are we on the right track? Are we in the process of multiplying the value of an online banner ad with 150.000 times in 74 years time? Well, huh? …….. We’re not there yet, but something tells me we are not on the right track. If you’re not convinced, try to read Adcontrarian.com, or the Harvard Business Review.

Schermafbeelding 2014-03-13 om 12.40.55.png

Drilling for oil

In a conference of Arabnet last year I spoke with a statistical analyst. We both happened to have a PhD in statistical physics, so we ended up talking about pretty nerdy stuff. He had a very interesting question, that to date I still cannot answer. Drilling oil is an important process in the Middle East, and he had been involved in real time calculations on the probability of hitting a new oil well – while drilling for it. Supposedly, when the chance of success was below 85%, they quit this expensive search for a new well. He asked me why we don’t do the same in online advertising? “Isn’t it about a lot of money?”  “Uh, yes, around $140 billion in 2014. “ “Can you measure at all? Do you have real-time data?” “Actually, yes we have.” “So where are you waiting for?” “… well, …. nothing I guess…”

Big data

So there is our solution: big data. We can measure anything, can’t we? The visibility of an online ad, the amount of effective exposures, the reach in the target group, the direct and direct responses of people, we can track their purchase, their GPS location. Anything you need to measure ad effectiveness. And most of the data is near-time or real-time, so we can adjust our campaign on the fly. In theory.

In practice, we don’t do that much. That may be caused by general industry resistance to certain innovations (death by thousand paper cuts, let’s avoid at least a few). But also because there are some lessons to be learned first. Big data is not the ultimate solution. It is heavy, messy and misleading. Working with it is a human problem, not a technological one. Below some of the lessons I have learned, working in this area for the last 5 years.

How to organize


First you got to organize yourself. Get the right data sources, get the right systems and the right expertise. If you look for available data sources, think of active or passive data (surveyed or measured, or a combination) and think of server side (at the side of the advertiser) or user-side (the side of the receiver). The first is available in more volume, with less depth. The second is deeper, but usually panel-based. Think of privacy – are all data personally identifiable data obtained by informed consent and double opt-in? Also think of possible biases in the data (was is measured, what not?) and representativeness (does it cover the whole population?).

If you think systems, think storage and analytics. Big data are voluminous, and real time, they need to be analyzed fast, their storage and analytics requirements are volatile, so you need be able to scale up (and down) fast.

The people that work with the data – usually data scientists – have to have a the right combination of hacking skills, mathematical and statistical skills and substantive expertise (knowing the industry).

 

Ground rules

Macintosh HD:Users:pietheinvandam:Desktop:Schermafbeelding 2015-02-04 om 17.52.28.png

 

Then, when you start working with the data, 5 rules apply (derived from practice).

Combine methods: never trust only one data collection source, always try to have two more, then you can triangulate your results and obtain the most probably answer

Start small: typical if big data is that you get more than you have asked for. When you work with it, start with a small sample, or even just one person, to understand what the data are saying. If you want to eat an elephant, do it bit by bit.

Test and kill: after your first analyses, derive as many hypotheses as you can, and then start testing them. Kill them as soon as you cannot falsify them and work on the others.

Team above tool: the way your team collaborates is far ore important than the tools you have. When you see a data-scientist working on a project without discussing his ideas and hypothesis with the client and with a technician? Them make sure he does.

Accuracy comes from you: big data may contain big errors. Big data are not representative. Big data is about correlations (not causalities). So when data comes from everywhere, accuracy comes from you!

Finally

Hopefully these lessons contribute to the learning curve of online advertising. See if we can beat the curve of TV-ads (150.000 times in 74 years, remember?).

About the author

Piet Hein van Dam is CEO of the Amsterdam-based internet startup Wakoopa. With a PhD in nonlinear dynamics, Piet Hein evolved to business developer type CEO. He spent more than 10 years at Unilever and KPMG Consulting, in international business development functions. In 2005, he became the managing director of Motivaction International, a Dutch market research company. In 2011 he joined internet start-up Wakoopa. The company provides user-centric online audience measurement technology to international market research agencies, advertisers, publishers and media agencies. Piet Hein is a regular speaker at conferences and author of several articles on behavioural data collection.