High Learning Rate

High Learning Rate

Share this post

High Learning Rate
High Learning Rate
The death of RAG

The death of RAG

RAG vs Long Context Length

Louis-François Bouchard's avatar
Francois Huppe-Marcoux's avatar
Omar Solano's avatar
Louis-François Bouchard
,
Francois Huppe-Marcoux
, and
Omar Solano
Jul 11, 2024
∙ Paid
4

Share this post

High Learning Rate
High Learning Rate
The death of RAG
2
Share

Good morning everyone!

Today, we’re diving into “the death of RAG.”

Many clients told us, “But why would I use RAG if Gemini can process millions of tokens as input?”

So, is RAG dead?

With the rapid advancements in LLMs and especially their growing context window size (input text size), many people now think doing RAG with Long Context models is no longer necessary. For example, OpenAI’s gpt-4-0314 model from March 14th, 2023, could only process up to 8k tokens. Now, gpt-4o can process up to 128k tokens, while gemini-1.5-pro can now process up to 2M tokens. This is ~3,000 pages of text!

We'll demystify the differences between RAG and sending all data in the input, explaining why we believe RAG will remain relevant for the foreseeable future. This will help you determine whether RAG is suitable for your application.

About RAG…

Keep reading with a 7-day free trial

Subscribe to High Learning Rate to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Louis-François Bouchard
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share