blog: Don Marti


generative ai antimoats

11 May 2023

(Update 14 May 2023: add meme and attention span paragraph.)

Read the whole thing: "We Have No Moat, And Neither Does OpenAI". To me, it looks more interesting than that. Big companies have sustainable anti-moats in generative AI. Independent and underground operations can do generative AI in ways that a big company can't.

First of all, filtering output. If you run a big service for everyone you have to hire a bunch of people to train it not to output something that could be illegal, somewhere. There are a bunch of these laws. You can't...

  • libel people in the UK

  • give a programmer in the USA some code to bypass a DRM system

  • insult the monarch of Thailand

  • give a user in Germany an article that denies a historic genocide

There are a lot of these. A Big Tech company doesn't know which users are going to report them for which crimes, so they have to try to keep everything as clean as possible for all users worldwide. An independent operation only has to comply with the law in their own jurisdiction. (Personally, I would not run any generative AI service, at any scale, in a country where the blasphemy law carries the death penalty, but that's just me.) Meanwhile, if you're running underground generative AI in the USA, you can probably get away with breaking a bunch of laws. Section 230 is your friend. Copy a trick from ad-supported pirate sites, which use a different immunity law (DMCA) to keep themselves out of trouble. Post the generative AI content from regular user accounts, and if anyone complains, play the Section 230 card and blame it on a user.

Second, training sets. Big companies keep records on how they use copyrighted works, and underground operators have ever-bigger hard drives full of material that they, or their friends, have been collecting since the 1990s. Copyright cases will affect Big Tech but not underground AI, which can use the same tricks for getting away with infringement that the regular ad-supported pirate sites already do.

Underground AI can also get training data from unsuspecting, unpaid users, like GPT-4chan did. Gamers and forum users are training underground AI right now. And how do you tell if the operator of the forum or game server is running the bots? Or are bots training on services offered by others? Or is the forum or game using the Section 230, we're shocked to discover what one of our users was doing, trick?

Meme version of this blog post

Big Tech companies are law-abiding enough that they have to invest heavily in filters on generative AI—but they do enough crimes that they have to collaborate with the expectation that their email, commit messages and chat will eventually become public (there is corporate training about how to email about crimes without having it come out in discovery but it turns out not to work as well as they thought it would.) Big Tech AI projects are being watched by regulators, competitors, employees who are going to be working for a competitor next week, wannabe whistleblowers, independent researchers...and soon will have to open up a bunch of records to plaintiffs' lawyers. That's a combination that will lead to chilling effects on internal communcations that underground AI operations don't have. (Yes, individual underground AI rings will get busted, but they're compartmentalized.)

And I almost forgot: the rent extraction and short attention span problems. Building a service on a Big Tech API is almost like opening a restaurant in a building owned by a San Francisco landlord: if you fail, you fail. If you succeed, they raise the rent on you. And, unlike commercial real estate leases, if you build on a Big Tech API, the landlord can kick you out at any time just because they get bored. Or, more likely, because the product manager nailed their OKRs, got promoted, and now the service is in the hands of someone who knows that you can't buy a house in Mountain View, California just by keeping an existing service from crashing. Steve Yegge explains it best.

Anyway, generative AI frenzy was rushed out in response to the 2022 Big Tech union organizing wins, like fast food robots were demoed before they were ready in response to "Fight for $15" a decade ago. But somehow it doesn't seem that they thought it through very well. That's probably a cultural problem, just because the instinct to do union-busting regardless of longer-term consequences is selected for so heavily. Anyway, AI is not going to be an advantage for the big incumbent companies that are still in the awkward halfway crook space.

More: notes on ad-supported piracy

Google promised to delete sensitive data. It logged my abortion clinic visit

It’s Time to Acknowledge Big Tech Was Always at Odds with Journalism

Washington's My Health My Data Act: Welcome to BIPA 2.0 (when the surveillance advertisers are freaking out this much about your law, it's probably a good law imho)

Will A.I. Become the New McKinsey? | The New Yorker

Tech job opportunities remain strong despite layoffs — here’s why

Facebook, Youtube and TikTok content moderators in Kenya form labour union

OpenAI’s hunger for data is coming back to bite it

Leaders, your mass layoffs may have an unintended side effect that hurts the company’s bottom line

ChatGPT creates mostly insecure code, but won't tell you unless you ask

AI Drake Just Set An Impossible Legal Trap For Google, by Nilay Patel, The Verge