Twitter’s Community Notes
As one of my major passions in life is epistemology, I’ve been thinking recently about the biggest recent achievement in epistemology, Twitter’s Community Notes feature.
Since the code is publicly available, I wanted to try to build a more theoretical understanding of what seems to be a very effective feature in practice.
I would like to understand the system from a few perspectives:
From a theoretical, mathematical perspective, what properties do the user and note factor embeddings have?
For example, I easily produced a toy example in which the preferred note (using matrix factorization alone) is stably different with a one dimensional embedding vs. a two dimensional embedding
Can we build a social model corresponding to the mathematical facts above?
How well does this work as a matter of practical security? For example, following Benjamin Crisman’s work modeling coordinated abuse.
This post, analyzing an early version of the model, was cited in the 2022 paper as part of the reasoning behind using user and note embeddings. I intend to replicate it with the current version of the model to see how the embeddings affect Ben’s toy model.
Can this model be applied outside of the community notes context? In particular, can a “crowd wisdom” type approach be used to validate LLM output?
Can a variety of LLM “personas” give as much breadth of perspective as necessary?
What is the computational cost of such a strategy?
How many critiques are needed for what level of coverage?
Is “increased number of critiques needed” predictable from the prompt?
Can an LLM be structured to make the generation of different perspectives more efficient? E.g. isolating the “persona” component of the model so that the shared context doesn’t need to be recomputed
This approach is similar to random forest aggregation of decision trees; are there lessons from these models that apply?
Overall there’s a lot to do here and I’m excited to get into the project!