There is an exciting paper (Sept. 2022) with Google participation showing that compositionality in the design of LLM prompts can improve the semantic capabilities of LLM's significantly!
The study shows how an LLM trained to decompose natural language prompt queries into their grammatical constituents provides much better results in generating SPARQL translations of these queries. Their technique, called dynamic least-to-most prompting, is demonstrated for the demanding CFQ dataset in the domain of movie production.
This is exactly what you can do with TextVerstehen training and test data: Teach your language model to decompose natural language input linguistically! ^
By Sebastian Goeser
Some of you may have seen my LinkedIn-Post on semantics for LLMs https://www.linkedin.com/feed/update/urn:li:activity:7140303280915542018 on the Enkeltrick (grand children's trick) test case which challenges the inferential capabilities of generative LLM's. In this blog post, we'll dive a little bit deeper into the this test case, discuss its background and propose a knowledge graph solution for it.
First, let us depict the case as clearly as possible: The following figure shows the Enkeltrick test case (now in English) in a RAG scenario: Essentially, we tell our LLM -eventually using explicit prompt instructions- to use the natural language context as a ressource to answer the question. Note that this context delivers all information required to answer the question. The three models, which we will call Famous, Well-known, and Hyped, perform quite differently, with Hyped being clearly the worse. We use default temperature settings which causes Famous and Well-known to overgenerate somewhat, and to behave non-deterministically across different runs with the same data.
Now let's take a more principled view: How do we do inferential semantics in a transformer-based LLM using a RAG scenario, as in the above example? Well, we apply a kind of distributional feature semantics obtained from our word embeddings to both query and context, and rely on accurate next word prediction to draw these inferences. Which, as we all know, comes surprisingly close to a real-world semantics.
But what exactly does this mean in terms of knowledge representation (KR)? Note that the critical information in the Enkeltrick scenario is symbolic KR - just true or false (, assuming there is no uncertain grandchildrenship).
In the following architecture diagram, we distinguish KR outside of the LLM (A), from KR inside (B). Scenario A is what we know from RAG, where the repo, of course, is often enough a probabilistic ressource, such as a vector space document repository, itself. But it may be a KG repository as well, as we will see later on. In scenario B, the critical information is stored in the LLM, eventually being trained in from an (additional) document, or a KG repository. It is obvious, from our point of view, that the LLM+ in B cannot be a better representation of KG information than the KG itself, which is why TextVerstehen focuses mainly on scenario A.
Now, let's assume that, through hard work plus some magic, we have obtained the following KG which represents the critical Enkeltrick information.
This KG consists, first, of a set of triples representing, in this case, binary relationships between entities, with the entities referring to resources such as, e.g., persons. Note that RDF allows for resources to be unnamed, as is the case for the entities "someOne1", "someOne2", etc. Secondly, a KG has inference rules where the rule shown above says, essentially, that the grand child of someone is a child of this person's child. (Note that real KG inference rules may be more pattern-oriented than this one).
The last step we need to take now is to draw inferences based on our query, and submit the knowledge graph triples obtained as a context to our LLM. As mentioned above, it may be beneficial use proper instructions telling the LLM what is context and what is query, respectively, and to instruct it not to mention the context in the answer. Although, various prompting techniques might be tried at this point. We can, e.g., instruct the model to select a proper subset of triples first and then, in a second step, instruct it to render these selected triples in natural language.
As you may try out yourself easily, this works surprisingly well: Famous delivers constantly the same correct answer, Well-known continues to be verbose, but delivers the correct information now, and even Hyped comes up with the right, albeit extremely short answer.
While this is not a proof or even substantial evidence, it is in line with what research find out about the usage of KG's with LLM's: KG-based question answering combines the complementary strengths of LLM's and KG's delivering substantially better results than simpler RAG techniques. If you want to find out more about KG question answering, please don't hesitate to contact the TextVerstehen team.
© Urheberrecht. Alle Rechte vorbehalten.
Wir benötigen Ihre Zustimmung zum Laden der Übersetzungen
Wir nutzen einen Drittanbieter-Service, um den Inhalt der Website zu übersetzen, der möglicherweise Daten über Ihre Aktivitäten sammelt. Bitte überprüfen Sie die Details in der Datenschutzerklärung und akzeptieren Sie den Dienst, um die Übersetzungen zu sehen.