LLM Fact Checking

TL;DR: I don’t trust models to understand even basic semantic claims well enough to check whether they are consistent across sources.



Among the many issues faced in the rapid rise of large language models, one is the issue of truth and trust. LLMs sometimes “hallucinate” the existence of reference material that does not exist. At the same time, trust in “experts” has eroded, with claims of expertise becoming highly political. The most interesting recent development in this space is X’s “Community Notes,” which combines traditional reputation screening with a component analysis algorithm to find corrections which will be trusted across the political spectrum.

My initial goal in working on LLMs was to replicate the community notes setup, to generate many different responses via diverse personas and find a universally-popular response. Some of the structure of my code reflects this initial purpose.

I quickly determined that trustworthy citations were a key component of successful community notes, and changed focus to RAG. However my results with RAG were inconsistent, even for straightforward questions.

This led me in a different direction, which was to address the semantic content of small amounts of text.

Here, I follow the classical mathematical logic notion of sentences as propositions which propose relations between entities. This is all a fancy way of saying that some simple sentences:

“Jack and Jill ran up the hill”

There are entities:

“Jack”, “Jill”, “the hill”

And a relation between them:

“Ran up”

In order to not get stuck up my own ass I decided to work on the most straightforward case here, which was to ask my LLMs to extract from a given sentence:

  1. The nouns in that sentence

  2. The verbs in that sentence

    1. The subject and object of each verb in the sentence


While I did start by limiting myself to models that run locally on my computer, I was surprised to find that I already ran into difficulty.

In the first test case I wrote, a two-sentence long reference from the Abed Nadir wikipedia page, no model was able to identify more than 20 of the 24 phrases surrounding the 8 verbs, with most models recognizing fewer than 14, and regular baffling mistakes.

This was somewhat surprising to me, and I am disappointed that even with the release of gemma3:12b (my current leading model) and several passes of prompt edits I have not made progress, nor did using larger models such as Gemini 2.0 Flash or Claude 3.5 Haiku give significantly better performance.

However 20/24 is plenty of identification to move to the next step of asking whether the model can validate the relations that it identified in the sentence. If it can, then it can at least check whether a text is accurately representing the claims made in its citations, ignoring the problem of whether the citation itself is trustworthy.

As a first experiment, I iterated through all of the verbs where the model had successfully identified both a subject and an object, then for each pair of the three phrases, asked the model to fill in the third, based on the same text it extracted them from:

“””
In the following text, what action does {Jack} take on or with respect to {the hill}?
Source text: “Jack and Jill ran up the hill”
“””
Expected response: “ran up”


At this stage, every model I could run locally had profound failures and baffling mistakes. For example:


Gemma3:4b:
{verb: is, subject: he, object: on the spectrum}
prompt:
“””
In the following text, what action does {he} take on or with respect to {on the spectrum}?
Source text: “While researching and creating the character of Abed, Community creator Dan Harmon realized he displayed symptoms and behaviors commonly associated with autism spectrum disorder. After consulting a doctor about it, Harmon concluded that he himself is on the spectrum.”
“””
Expected response:
Is
Model guesses:
Realized, consulting, concluded


Example:

Gemma3:12b:
{verb: displayed, subject: he, object: symptoms}
Prompt:
“””
Determine who or what it is that “he” most likely takes the action “displayed” on or with respect to in the following text.
Source text: “While researching and creating the character of Abed, Community creator Dan Harmon realized he displayed symptoms and behaviors commonly associated with autism spectrum disorder. After consulting a doctor about it, Harmon concluded that he himself is on the spectrum.”
“””
Expected response:
Symptoms and behaviors
Model guesses:
Abed, Dan Harmon, a doctor, he himself


While most of the responses are reasonably good, the errors are extremely basic and common. For example, gemma3:12b only reconstructs 13 of the 15 phrases across the 5 relations it identified, even when using the same text as a reference.

Trying to use this method to check whether a citation accurately represents its source already seems futile. This leaves me at a loss for how to proceed, since the basic mechanisms of understanding the semantic content of claims are barely present.


—--------


Since I am using json formatting for my outputs, it is non-trivial to use some varieties of prompt engineering.

For example, Chain of Thought encounters problems, which made my attempts worse–the added complexity of instruction I think outweighed the gains of chain of thought, since most of my questions do not require reasoning beyond re-referencing the text. I do suspect that restructuring the initial “find all verbs with subject and object” prompt to repeat the source text for each verb would get more correct answers, but it would also add a lot of overhead.


Next
Next

LLM Tokenizer Compression