Community Notes 2

Part 1

Last time I discussed a mock-data analysis of an old version of twitter’s community notes ranking algorithm. I have finally gotten around to converting the data generation of that code into python. I haven’t replicated Ben’s visualizations yet, nor am I re-implementing any of the ranking code such as calculating preliminary note scores—just the mock user/post/note/ratings data.

Rather than replicate the whole process, I decided to start by feeding the raw ratings data directly to the matrix factorization algorithm.

Embedding Consistency

The function test_one_dim_moderate_statement_wins uses mock data hand generated in the following way:

There are three notes, “Team A Propaganda”, “Moderate Statement”, “Team B Propaganda”
There are six users, “Team A radical”, “Team A moderate”, “Team B radical”, “Team B moderate”, “Moderate A sympathizer”, “Team A semi-radical”
the radicals mark their team’s propaganda as helpful and all else as unhelpful
the moderates mark their team’s propaganda and the moderate statement as helpful
the last two mark team A propaganda and the moderate statement as somewhat helpful and helpful and vice versa

This creates a one-dimensional political spectrum with a somewhat independent “willingness to moderate” spectrum. One dimensional embeddings cause the moderate statement to consistently be rated highest by the factorization, while two-dimensional embedding does not consistently favor the moderate statement over Team A Propaganda. This is the example I referred to in 1.1 in part 1.

Twitcher Identification

When running the matrix factorization on data mocked using Ben’s process, the two different user types were not that easy to tell apart:

Summary statistics of matrix factorization values for birder and twitcher users

The internalRaterFactor1 factor values, which are intended to capture political spectrum variation among raters, are not distinguishable between birders and twitchers, with basically total overlap between the two sets. Twitchers did have more noticeably lower internalRaterIntercepts, with significantly less variance.

This lack of difference seems surprising, though note that the mock data generation means that Twitchers in this model, who are coordinating to promote political misinformation, have no other markers of their activity being the same. The default choice of gamma=0.1 from that post is also odd here, as it means twitchers are preferring to stay away from political posts that make up by default 0.25 of the proportion of posts.

Recomputing with gamma=0.3, and twitcher note and ratings multipliers of 5, to match a case when twitchers are able to overcome filters based on helpfulness, we find:

We now see major differences in internalRaterFactor1 as well as in internalRaterIntercept. We also see a big difference in internalRaterIntercept, which should mean that twitchers rating things as “helpful” is significantly discounted in value.

It’s good to see that if variables change to make twitchers a more serious source of misinformation, this technique is better at detecting them!

Another important note here is that when the twitcher notes and ratings multipliers are high, even just at 2, the proportion of twitchers who are included in the factorization is much higher than the proportion in the population—in this case jumping from 10% of the population to 33% of the evaluation group, more than the 2x from the difference in number of ratings. This is to be expected because note ratings are only evaluated when the post is marked as misleading by the note. For birders, the note will be helpful on a misleading post about 10% of the time (the proportion of the time it is actually misleading, minus 5% of that for birder note errors, minus 5% for birder rating errors, plus 5% of 5% for double errors, actually adds up to 9.25%; 2.3125% of posts will have notes added and be in politics) whereas twitchers will be writing and rating misleading notes on true posts to make them seem false (90% base rate true, 25% of birder ratings * 5% error + 30% of twitcher ratings = 3.1725% of posts will have notes added and be in politics), which is 15% more ratings worth analyzing overall. The expected 2.3x ratio is still a bit off from the 3.3x ratio observed; maybe I’m miscounting somewhere?

Next steps

There is still a substantial chunk of work to do in aligning the mock data generation with the real data wrangling, adding data visualization, and perhaps most of all in using github properly to put the code in a proper place for development and re-use.

But I don’t care about those things and want to move on to questions about LLM use cases, so I’m going to do that.

Embedding Consistency

Twitcher Identification

Next steps

LLM Tokenizer Compression

Fictional Governments

Questions? Say hi.