blog: Don Marti


notes on ad-supported piracy

14 May 2023

(This is a blog version of a thread on federate.social, somewhat expanded and with more links added.)

The more I learn about how web ads work, the less impact it looks like Large Language Models (LLMs) (either big company ones, or the underground ones that will do better than Big Tech can) will have on web publishing.

The pessimistic case is that LLMs trained on original human-written content will generate large quantities of derivative work and drive down ad rates. But the problem with that theory is that web ad rates are already being set by the lowest-effort sites. It's not hard to find big-budget brands running their ads on sites that are full of obvious copies of legit works available elsewhere. One well-known large Internet company sells both authorized e-books on a store app and runs the ads on pirate e-book sites offering the same content.

I know I mention e-books a lot when I talk about ad-supported piracy, but that's because of my own media consumption habits. I have a pretty good idea how to find out which publishers work with authors who write books in English, but I don't know anime. I'm also really weak on the streaming rights to UK and European football (soccer), and I don't know Bollywood movies or the e-book market in languages other than English. There are hella sites offering the same titles in all of those areas, and I doubt they're all legit, but my language and business knowledge is limited so I don't know. bonus link about the book business and piracy, from Hugo-winning author Charles Stross

Anyway, I have been checking a lot of these sites out, and getting to the point where I'm starting to have an outline of the ad-supported piracy racket. It's one of those emergent crime networks that grows in the cracks between different kinds of laws.

Step 1. This is the tricky part, and all the rest of the crime stuff falls into place once this is working. Make an advertising system that either hides or obfuscates the information on where the ads ran.

It is possible for advertisers, or their agencies, to get the same info I did, and get a better idea of how this problem affects a specific brand or campaign. If you work with an up-to-date, honest agency, and they use adtech partners who correctly implement the ads.txt, sellers.json, and SupplyChain industry standards, then it's possible to get a trustworthy report on where the ads appeared.

In practice, not only do many adtech intermediaries fail to offer a tear sheet to the advertiser and agency, they actively obfuscate much of the data that would reveal ad-supported piracy if it was widely available. Some well-known adtech intermediaries throw huge amounts of development work at hiding it. And they're making it worse, not better. (At least one of the systems intended to replace third-party cookies seem to be designed to continue the leakage of audience data from legit to infringing sites.)

short digression: This is why ad agencies started in the first place. If you're a manufacturer in Connecticut making patent medicine or barbed wire or whatever, you have no idea which of the newspapers in the western USA are legit, and which are some no-good outlaw who prints one copy of your ad just for the tearsheet, then invoices you for reaching a whole town. So you hire an agency in Chicago to buy ads for you and check that the newspapers are real. From just handling the insertion orders, tearsheets, and invoices, the agency business model eventually grew to include making the ads. Are there any Western novels about an advertising agency media buyer who rides into town to sort out the real newspaper from the no-good cheating varmints sending faked-up tearsheets?

Step 2: Find your hard drive(s) full of copyright-infringing content, make a pirate site, and either sign up with a big company to get ads on it, or use a dark pool sales house to hide your pirate site under the same ad ID as a bunch of other sites.

Step 3: The pirate site is hidden from the advertisers, so they won't complain about paying full price for low effort—but eventually the copyright holders will complain to the search engines to try to get the pirate site kicked out.

Step 4: That's where the pirate site points to their "DMCA PolicY" and responds: we're shocked to discover that "one of our users" uploaded that copyrighted work "without permission." The pirate site can claim a DMCA safe harbor as an online service provider but it doesn't look like the safe harbor applies to the adtech firms or to the advertiser.

Step 5: Eventually the pirate site gets kicked out of search engines, but that's fine because they have already registered more domains and copied the content over.

Step 6: This is the strange part. Even when the search engine and the ad network are the same company, the pirate site operator can keep their ad account while losing domains to copyright complaints. I see the same ad accounts carrying over across 3 domains that appear to have been either kicked out of search or pulled down for copyright reasons.

The crime here is behind the scenes: a site pops up with all or most of the same copyrighted content as the owner's last site, but when they get a DMCA takedown, they blame it on some "user upload." If, as the copyright holder or advertiser, you could find the site, then you could probably figure out what they're up to, but the way that web ads work is designed to make it hard to see where they went.

screenshot of a pirate copy of a computer book with a CVS ad Example from a pirate ebook site, showing a book listing and the footer ad. (This is one of those "confidential" sites we hear so much about.)

Meanwhile, the advertiser could be insisting on accurate reporting, but most advertisers don't do this, and nobody at the agency wants to point out mistakes being made by their client (even if the agency is honest and not somehow in on the racket). Anyway, LLMs are likely not going to be transformative here, more of a force multiplier for an existing business model.

More: the Linux device driver hacker’s guide to giant Internet monopoly dominance

AOC and Matt Gaetz teaming up to ban congressional stock trades could signal a generational divide

How Nissan won the internet with a four-hour ad

How Facebook and Instagram became marketplaces for child sex trafficking

The ‘big shift’ around children’s privacy

How should mobile apps prepare for California's privacy scrutiny?

Google calls for relaxing of Australia’s copyright laws so AI can mine websites for information

AI is taking the jobs of Kenyans who write essays for U.S. college students

The ‘Montana Consumer Data Privacy Act’ Reminds us that Privacy is Bipartisan

Google to deploy generative AI to create sophisticated ad campaigns

Is Micro-targeting Giving the Digital Advertising Sector a Bad Rap?