Securing the Legitimacy of Display Ad Inventory

There is now a strengthening industry resolve to remedy systemic failures across the display advertising ecosystem and to improve the quality of the ad inventory being traded. This is being led, in particular, by the IAB’s Traffic of Good Intent Task Force.

Much needs to be done. Members of the Task Force are starting to explore the problems facing the industry, the potential solutions, industry education and the implications for international law enforcement.

In this post we would like to make suggestions to facilitate the efforts of the Task Force, in particular, and the display advertising ecosystem, more generally.

An Advertising Security Mailing List
Whilst it is encouraging that industry leaders have joined the Task Force and are wanting to take responsibility for securing the legitimacy of the ad inventory being traded, we believe that anyone should be able to contribute to the conversations shaping the future of the industry and we believe that everyone should have access to any information that is provided.

The Bugtraq Mailing List provides an illustrative example of how we believe this should be done. Bugtraq is regarded as the leading general security mailing list. It is typically where information security vulnerabilities are first announced.

If a similar advertising security mailing list was created, then everyone in the ecosystem would be able to review and debate how best to secure the legitimacy of the ad inventory being traded. Everyone would also be able to disclose specific vulnerability details (like, say, this or this)—in much the same way that today’s information security researchers disclose specific vulnerabilities through Bugtraq.

If vulnerability details are disclosed—whether through some advertising security mailing list or otherwise—we suggest that the disclosed details are always specific. The credibility of our collective efforts to improve inventory quality is at stake. So, as an industry we need to be vigilant for sweeping, unsubstantiated generalisations. We suggest also that all details are disclosed with due care. We believe strongly that names should not be given without culpability being clear.

No More Disingenuous Scaremongering
It is disappointing to have to call out a peer, but last week provided a frustrating example of disingenuous scaremongering. This sort of announcement to the industry cannot continue—no matter the choice of medium. If the industry continues to be fed these sorts of untruths, the true facts will almost certainly be lost. We will collectively be like The Boy Who Cried Wolf.

Let’s consider the details. The company that issued the industry announcement is a captcha company. It offers two captcha products for the web. The first of these products, called a CAPTCHA TYPE-IN™ ad, is used across websites today to protect against automated submissions and spam. Instead of showing users traditional captchas, the captcha company shows a proprietary ad format in which there is a text box and the user needs to type in an answer to some question related to the ad. This is not a traditional online display/video ad. It is a form of monetisable captcha. It is served when one would serve a captcha (say, to protect against a spam comment)—not when one would serve a traditional ad. And engagement is as one would expect of a captcha—engagement is not indicative of engagement with a traditional ad.

The second of the company’s web-focused captcha products is called a Pre-Roll Video TYPE-IN™ ad. It provides someone who wants to watch an online video the option of skipping the associated pre-roll ad by typing the advertiser’s brand message into a text box. This product aims to increase human engagement with ads. It is not a way to protect video ads from bot activity. If a user does not enter anything in the text box of a Pre-Roll Video TYPE-IN ad, then this does not mean that the user is automated.

Given the nature of these two products, it seems more than just a little disingenuous of the company to have claimed that “bot traffic patterns remained consistent in a range of 24% to 29% for web advertising” when what the company actually meant was that “bot traffic patterns remained consistent in a range of 24% to 29% for [two types of TYPE-IN ad].” CAPTCHA TYPE-IN ads are not at all like traditional display/video ads; and Pre-Roll Video TYPE-IN ads do not protect against bots. So why did the company make their bold claim? To make matters even more perturbing, the company then extrapolated in some inexplicable way from a sample of 1.4 billion served captchas over two quarters this year and made a claim about the whole of the online display advertising industry: “the global digital advertising industry is on pace to waste up to $9.5 billion in 2013 advertising to bots.” Why make their claim even bolder?

No More Talk of Suspicious Traffic
Whilst last week’s release provides an extreme example, this is part of a broader industry affliction that we believe needs to be addressed urgently. Ambiguous language/terminology is being used widely to describe the industry’s problems with illegitimate ad impressions—both by the demand side and by the supply side—and this makes solving these problems difficult. Whilst we do not doubt that constructive and interesting work is being done by the various companies, we believe that collective efforts across the ecosystem are being undermined by persistent use of imprecise terminology. This is particularly important because the incentives of the demand side run wholly counter to the incentives of the supply side, and if ambiguous language continues to be used the two sides will never agree.

What should we read into each new set of statistics on suspicious ad traffic? Do the demand side and supply side have the same definition of suspicious traffic? Should we understand that suspicious traffic is not traffic of good intent, or that it is non-intentional traffic, or that it is fraudulent traffic, or even that suspicious traffic is botnet traffic?

Let us consider some particular examples. When ad requests are made via proxied security services like ScanSafe, do different companies label these as suspicious requests? When the TalkTalk parental control and malware checker masquerades as a human visitor to websites, do different companies class these as suspicious? Many malware checkers masquerade as legitimate surfers to search for exploit kits embedded within website—because if they did not do this in a clandestine way, then nefarious website owners would serve different content to the malware checkers than they serve to legitimate website visitors. When ads are inadvertently served to auction bots on eBay do different companies label this as fraud—noting that the motive for using auction bots is wholly unrelated to advertising? When ads are served on webpages that have been prerendered by Chrome browsers do different companies label this as suspicious? Certainly ads should not be served on prerendered webpages. But surely “suspicious” is not the right term. This seems more akin to when ads are mistakenly served to Google’s Web Preview bot. Illegitimate? Yes. Suspicious? No.

A Fine-Grained Taxonomy for Illegitimate Ad Requests
Our contention is that if inventory quality is to improve to the extent that it can and should, then it is imperative that the whole industry adopts more fine-grained terminology. If the demand side and the supply side are to engage in suitably constructive dialogue, then both sides need to be talking specifically about the same thing. Furthermore, if fine-grained categories are established then best-in-class solutions can be applied to each fine-grained category of problem. In today’s ambiguous world there is arguably more incentive to provide an inferior catch-all solution than a best-in-class targeted solution.

To this end, we would like to propose for industry consideration an initial taxonomy for illegitimate ad requests. These are the nine labels we currently apply to illegitimate ad requests:

Hijacked device* with a fully automated browser (c.f. bit.ly/137MdMh)
Hijacked device* where the browsing session of the unwitting device owner is hijacked through a redirect/pop-up/pop-under (c.f. mzl.la/Tf5Fv7)
Hijacked device* where ads are injected into pages visited by the unwitting device owner (c.f. bit.ly/169fYdN)
Fraudulent ad hiding (c.f. bit.ly/12ZiKS9)
Crawler masquerading as a legitimate user
Non-browser User-Agent header
Cloud IP Address
Browser prerendering
Tor

(* PC/Tablet/Phone)

This taxonomy may not be complete. This taxonomy may also not yet be granular enough. We welcome suggestions from across the ecosystem.

In particular, we anticipate brand-safety specialists wanting to add site-categorisation labels to the illegitimate-request taxonomy: like webpage with hate speech and file-sharing website. However, as we are not a brand-safety company (we don’t crawl websites or analyse website content), we will not be so bold here as to propose any brand-safety labels.

We look forward to hearing your suggestions.

Posted on September 18, 2013 at 5:53pm