blog: Don Marti


Your cohorts are just ethnic affinity groups. Change my mind.

12 March 2021

(Update 28 Feb 2022: Berke and Calacci link)

(Update 9 May 2021: add another example)

The big question around Google FLoC is whether or not some of the FLoC cohorts, which are group identifiers applied by the browser and shared with all sites, will match up with membership in legally protected groups of people. Will cohorts turn out to be the web version of Facebook's old Ethnic Affinity Groups, also known as multicultural affinity groups?

2022 update: Browsing behavior correlates with race, but cohorts do not....We did not find with our t-closeness analysis that the likelihood of correlating racial background with cohorts, using the FLoC OT algorithm, was any greater than chance. (Privacy Limitations Of Interest-based Advertising On The Web: A Post-mortem Empirical Analysis Of Google's FLoC

Facebook limited the ability of advertisers to exclude members of these groups in 2018 and made many of the groups unusable for targeting at all in 2020. But FLoC is a little different. It assigns numbers, not names, to cohorts, so the unsolved problem is how to tell which cohorts, if any, are actually ethnic affinity groups. One issue on GitHub asks,

If we do have an issue where racially specific targeting is incidentally created by the ML system what happens when advertisers target for or against it and who ends up responsible?

FLoC developers are planning to use sensitive-page classifiers to check which cohorts match up to sensitive groups of pages in web history. Unfortunately, checking page content is not going to give them protected group membership for the users. A simple US-based example is school and neighborhood patterns. A school that is mainly attended by members of a single ethnic group is going to have page content that's mostly the same as all the other schools in the district. The schools all have similar events and play the same sports, but serve different groups of students and parents. So, even though the content is non-sensitive, the cohort is. And local stores with similar merchandise in different neighborhoods are going to get different ethnic affinity groups, I mean cohorts, of visitors. Content in language A could be completely non-sensitive, and local content for region B could be completely non-sensitive, but the cohort of people who use language A in region B could be highly sensitive.

So it might look like nobody will be able to tell which cohorts are really ethnic affinty groups until some independent data journalism site manages to do a study with a panel of opted-in users. This would be the kind of AI ethics research that is bad for career prospects at Google, but that independent organizations can often come up with the funding to do.

But one company doesn't have to wait for the study and resulting news story. Facebook has enough logged-in Google Chrome users that they could already know which FLoC cohorts match up to their old ethnic affinity groups. If a brand buys ads on the open web and relies on FLoC data, Facebook can see when the brand is doing crimes. This doesn't mean that Facebook will disclose the problem, since it gives them something to hold over the brand. No more making any stink about ad metrics or Facebook Groups IRL get-togethers. The extra risk for the advertisers means lower expected revenue for ad impressions tied to FLoC—because of uncertainties that are hard for anyone else to see.

Inspiration for the title for this post:

Your probabilistic ID is just fingerprinting. Change my mind.

— Stephanie Layser (@slayser8) Twitter, January 27, 2021

How Facebook got addicted to spreading misinformation

It Turns Out You Can Bullshit A Bullshitter After All

How to poison the data that Big Tech uses to surveil you