In Retrieval-Augmented Generation (RAG) systems, when the final answer is incorrect, it's crucial to determine whether the error stems from retrieval or generation. Here’s how to distinguish:
- Retrieval failure: The retrieved documents lack relevant information. Check retrieval metrics like recall, precision, and mean reciprocal rank. If the top-ranked documents are irrelevant, the retriever is at fault.
- Generation failure: The retrieved documents contain the correct information, but the LLM fails to use it properly. This often manifests as hallucination, contradiction, or unnecessary addition. Evaluate faithfulness and answer relevancy to confirm.
Common diagnostic steps:
- Manually inspect the retrieved context for each query.
- If context is good but answer is bad → generation failure.
- If context is missing or wrong → retrieval failure.
Fix retrieval failures by tuning chunk size, embedding models, or adding re-rankers. Fix generation failures with better prompts, fine-tuning, or constraints.