Good morning everyone!
Today, we’re diving into “the death of RAG.”
Many clients told us, “But why would I use RAG if Gemini can process millions of tokens as input?”
So, is RAG dead?
With the rapid advancements in LLMs and especially their growing context window size (input text size), many people now think doing RAG with Long Context models is no longer necessary. For example, OpenAI’s gpt-4-0314
model from March 14th, 2023, could only process up to 8k tokens. Now, gpt-4o
can process up to 128k tokens, while gemini-1.5-pro
can now process up to 2M tokens. This is ~3,000 pages of text!
We'll demystify the differences between RAG and sending all data in the input, explaining why we believe RAG will remain relevant for the foreseeable future. This will help you determine whether RAG is suitable for your application.
About RAG…
Keep reading with a 7-day free trial
Subscribe to High Learning Rate to keep reading this post and get 7 days of free access to the full post archives.