Despite impressive visual fidelity, current text-to-image (T2I) diffusion models struggle to depict rare, complex, or culturally nuanced concepts due to training data limitations. We introduce RAVEL, a training-free framework that significantly improves rare concept generation, context-driven image editing, and self-correction by integrating graph-based retrieval-augmented generation (RAG) into diffusion pipelines. Unlike prior RAG and LLM-enhanced methods reliant on visual exemplars, static captions or pretrained knowledge of models, RAVEL leverages structured knowledge graphs to retrieve compositional, symbolic, and relational context, enabling nuanced grounding even in the absence of visual priors. To further refine generation quality, we propose SRD, a novel self-correction module that iteratively updates prompts via multi-aspect alignment feedback, enhancing attribute accuracy, narrative coherence, and semantic fidelity. Our framework is model-agnostic and compatible with leading diffusion models including Stable Diffusion XL, Flux, and DALL-E 3. We conduct extensive evaluations across three newly proposed benchmarks - MythoBench, Rare-Concept-1K, and NovelBench. RAVEL also consistently outperforms SOTA methods across perceptual, alignment, and LLM-as-a-Judge metrics. These results position RAVEL as a robust paradigm for controllable and interpretable T2I generation in long-tail domains.
RAVEL enhances image generation by integrating contextual details often overlooked by standard models for a variety of domains. *Note that the reference images are shown solely for illustrative purposes and are not used by our framework.
RAVEL’s effectiveness in generating complex mythological and fictional concepts without prior visual exemplars. The first 3 rows are global mythology concepts, while the last two rows are the Project Gutenberg novel characters.
Our self-correction mechanism ensures accurate depictions of concepts via iterative, context-aware prompt refinement.
We conduct a comprehensive evaluation of 'RAVEL' in two stages, assessing both its foundational RAG component and image generation. We benchmark our approach against SOTA T2I models-such as Flux, SDXL, and DALL-E 3 across image generation and two rounds of self-correction.
We evaluate RAVEL across four benchmarks: the standard T2ICompBench and our three newly proposed benchmarks - MythoBench, Rare-Concept-1K, and NovelBench. Each benchmark targets a distinct challenge — compositional accuracy, symbolic complexity, fine-grained rarity, and zero-shot generalization. RAVEL consistently outperforms all baselines across metrics like Attribute Accuracy, Context Relevance, and Visual Fidelity.
@misc{venkatesh2025ravelrareconceptgeneration,
title={RAVEL: Rare Concept Generation and Editing via Graph-driven Relational Guidance},
author={Kavana Venkatesh and Yusuf Dalva and Ismini Lourentzou and Pinar Yanardag},
year={2025},
eprint={2412.09614},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.09614},
}