Embeddings + Knowledge Graphs: The Ultimate Tools for RAG Systems

Published in

Towards Data Science

10 min readNov 14, 2023

Artificial intelligence software was used to enhance the grammar, flow, and readability of this article’s text.

The advent of large language models (LLMs) , trained on vast amounts of text data, has been one of the most significant breakthroughs in natural language processing. The ability of these models to generate remarkably fluent and coherent text with just a short prompt has opened up new possibilities for conversational AI, creative writing, and a wide array of other applications.

However, despite their eloquence, LLMs have some key limitations. Their knowledge is restricted to patterns discerned from the training data, which means they lack true understanding of the world.

Their reasoning ability is also limited — they cannot perform logical inferences or synthesize facts from multiple sources. As we ask more complex, open-ended questions, the responses start becoming nonsensical or contradictory.

To address these gaps, there has been growing interest in retrieval-augmented generation (RAG) systems. The key idea is to retrieve relevant knowledge from external sources to provide context for the LLM to make more informed responses.

Most existing systems retrieve passages using semantic similarity of vector embeddings. However, this approach has its own drawbacks like lack of true relevance, inability to aggregate facts, and no chain of reasoning.

Embeddings + Knowledge Graphs: The Ultimate Tools for RAG Systems

Written by Anthony Alcaraz