blog: Don Marti


Topics API notes and links

10 September 2022

(Update 14 Sep 2022: add non-sensitive but possibly user-unacceptable use cases)

Some notes on how Google Topics API is likely to affect various web stakeholders, in order by priority of constituencies. Numbered links are to issues in the GitHub repository.


  • Topics API rewards deceptive sites and traffic sourcing practices. Topics API leaks valuable audience data from one site to another. As I wrote on AdExchanger, this is a problem that it shares with third-party cookies—enabling the monetization of low-value sites of little interest to users. As the proposal currently stands, any site that a user can be tricked into visiting, or that their device can be manipulated into visiting, is in a position to collect ad revenue based on Topics API data that the browser supplies.

    Topics API gives people an additional incentive to create deceptive sites and drive traffic to them by deceptive or harmful means. Operators of deceptive sites will see users as Topics piñatas—hit them with something like email spam or a deceptive link, get them to drop valuable Topics, profit. Technically this is no worse than third-party cookies, but third-party cookies are on the way out for a reason, and Topics API is positioned as a longer-term thing.

  • Topics API enables price discrimination. Topics API would help retail, travel, and other sites identify more or less price-sensitive users based on their interests. And Topics API is more likely to be used for personalized pricing than other data sources, because it looks safer. Many existing sources of data, such as geolocation or device info, might be correlated with membership in legally protected groups of people. However, Topics API is carefully intended not to be tied to sensitive information, which lowers the perceived risks of using the API for personalized pricing. (see #34)

    Topics API discrimination does have some public sector uses that might be acceptable in some jurisdictions but not others. Users in the USA might accept the TSA using Topics API to optimize passenger screening, the IRS using it to select taxpayers for audits, or the CDC using it for public health data collection. But some other countries might use it for purposes that are not acceptable to some users there.

  • Topics API could reveal sensitive information or identify individuals in combination with other data. Some privacy concerns for users are covered in a blog post from DuckDuckGo..

Some uses of Topics API outside the conventional publisher/advertiser use cases would be more or less acceptable to users, even if Topics API is successful in removing the ability of a caller to detect a sensitive user characteristic.

Some jurisdictions might require use of a public-sector caller by sites—so not all callers are necessarily chosen or approved by the first party.

  • HMOs or public health services classifying patients by healthy/unhealthy habits

  • LMSs classifying diligent and distracted students

  • Tax authorities classifying likely big-spending users for tax audits

There are some users who prefer personalized advertising, about 31-36% of them, but a future personalize my ads feature would need to be designed to avoid user harms that would affect both personalizers and non-personalizers.

Web page authors

  • Topics API enables leakage of audience information across sites. The main beneficiaries of Topics API audience data leakage would be sites that advertisers would not choose to support, along with the adtech intermediaries that enable them. Ad revenue that goes to sanctioned, copyright-infringing, brand-unsafe or other problem sites is not available to the authors of legit web pages (and a waste for advertisers, but they're not on the list of constituencies that I'm using). Cross-site data leakage is a problem for users because of the incentives covered above, but also a problem for legit web sites because of revenue. (And yes, it is better for users when ad revenue goes to sites they like, and not sites they got tricked into clicking through to.)

  • Topics API is biased in favor of large, general-interest sites. Topics API assigns only one or a few topics to each domain. A niche site that covers one or a few topics would provide significant commercially valuable data to the system, while a large site such as YouTube, that covers every topic, would provide no usable information. This is a pretty obvious problem for competition policy, so I'm kind of surprised they left it in.

  • Topics API encourages consolidation of third party services. The more sites that a third-party service runs on, the more useful information it can collect from Topics API. Since an individual site can't get usable information out of Topics API, sites will have to A/B test different combinations of third-party iframes as callers. Sites will keep callers that provide more good Topics, and drop others. After repeated cycles of A/B tests, where callers that get dropped by one site start providing worse data to other sites and get dropped there too, we would be left with one domain doing all the Topics API calls. (see #82) This is also a pretty glaring antitrust issue.

The Topics API README states that,

One of (Interest-based advertising)’s benefits is that it allows sites that are useful to the user, but perhaps could not be easily monetized via contextual advertising, to display more relevant ads to the user than they otherwise could, helping to fund the sites that the user visits.

This goal can be achieved without the extensive leakage of site audience data made possible by Topics API. Seller-Defined Audiences is an alternative that would give a data gatekeeper role to sites that produce trusted content, which would help avoid creating incentives to harm users and increase revenue opportunites for web publishers that users trust.

Web authors and publishers might be able to ameliorate some of the effects of Topics API by, for example, opting out their signed-in users, or opting out of Topics API data collection entirely until YouTube channels are treated as separate areas with their own topic sets, on an equal footing with domains (see #17). But we still have the SEO problem hanging over us. Google has not stated that they have ruled out having Topics API usage affect search rank, so sites might not have much room to work here.

User agent implementors

User agent implementors have limited developer and tester time available to implement and support new features. All software projects must leave some good ideas un-implemented in order to devote time to the features they decide to do. And as the feature set of a browser grows, so does the required ongoing maintenance. Choosing to implement and support Topics API, even for a large company, would mean failing to do a useful feature. Some Google Chrome features are good, and out ahead of other browsers. It would be a waste to keep putting time into this one.

'Likely to cause substantial injury:' Why the FTC put Kochava in the spotlight

You’re being tracked through your email. Here’s how to stop it.

Apple Already Decimated Meta’s Adtech Empire. Now, It’s Honing In On Facebook's Advertisers, Too., by Shoshana Wodinsky, MarketWatch

Inside Fog Data Science, the Secretive Company Selling Mass Surveillance to Local Police

We disabled Google AMP at Tribune Publishing