Grounding text-to-image diffusion models for controlled high-quality image generation

Süleyman, Ahmad

Grounding text-to-image diffusion models for controlled high-quality image generation

dc.contributor.author	Süleyman, Ahmad
dc.date.accessioned	2025-11-13T06:31:17Z
dc.date.available	2025-11-13T06:31:17Z
dc.date.issued	2025
dc.department	TAÜ, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.description.abstract	ge-scale text-to-image (T2I) diffusion models have emerged as the new state-of-the art image generative models, demonstrating outstanding performance in synthesizing diverse high-quality visuals from natural language text captions. Image generative models have a wide range of applications including content creation, image editing and medical imaging. Although simple and powerful, text prompt alone is insufficient for tailoring the generation process into creating customized outputs. To address this limitation, multiple layout-to-image models have been developed to control the generation process by utilizing a broad array of layouts such as segmentation maps, edges, and human keypoints. In this thesis, we propose ObjectDiffusion, a model that builds on the top of cutting-edge image generative frameworks to seamlessly extend T2I models with object names along with their corresponding bounding boxes. Specifically, we make substantial modifications to the network architecture introduced in ContorlNet to integrate it with the condition processing and injection techniques proposed in GLIGEN. ObjectDiffusion is initialized with pre-training parameters to leverage the generation knowledge obtained from training on large-scale datasets. We fine-tune ObjectDiffusion on COCO2017 training dataset and evaluate it on COCO2017 validation dataset. Our model achieves an AP score of 27.4, an AP50 of 46.6, an AP75 of 28.2, an AR of 44.5, and a FID of 19.8 outperforming the current SOTA model trained on open-source datasets in AP50, AR, and FID metrics. ObjectDiffusion demonstrates a distinctive capability in synthesizing diverse high-quality, high-fidelity images that seamlessly conform to the semantic and spatial control inputs. Evaluated in qualitative and quantitative tests, ObjectDiffusion exhibits remarkable grounding abilities on closed-set and open-set settings across a wide variety of contexts. The qualitative assessment verifies the ability of ObjectDiffusion to integrate multiple grounding entities of different sizes and locations. The results of the ablative studies highlight the efficacy of our suggested weight initialization in harnessing the pre-training knowledge to enhance the conditional model performance.
dc.identifier.citation	Süleyman, A. (2025). Grounding text-to-image diffusion models for controlled high-quality image generation. Türk-Alman Üniversitesi, Fen Bilimler Enstitüsü.
dc.identifier.uri	https://hdl.handle.net/20.500.12846/2096
dc.language.iso	en
dc.publisher	Türk-Alman Üniversitesi, Fen Bilimler Enstitüsü
dc.relation.publicationcategory	Tez
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Stable Diffusion
dc.subject	Kontrollü görüntü üretken modeli
dc.subject	Üretken modeller
dc.subject	Difüzyon modelleri
dc.subject	Görüntü üretimi
dc.subject	Metinden görüntü üretimi
dc.subject	Temellendirilmiş görüntü üretimi
dc.subject	Controlled Image Generative Models
dc.subject	Conditional Image Generative Models
dc.subject	Grounded Image Generative Models
dc.subject	Text-to-Image Models
dc.subject	Image Generation
dc.subject	Generative Models
dc.subject	Diffusion Models
dc.title	Grounding text-to-image diffusion models for controlled high-quality image generation
dc.title.alternative	Kontrollü yüksek kaliteli görüntü üretimi için metin- görüntü difüzyon modellerinin koşullandırılması
dc.type	Master Thesis

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: Ahmad_SÜLEYMAN_216107002.pdf
Boyut:: 36.82 MB
Biçim:: Adobe Portable Document Format

İndir

Lisans paketi

Listeleniyor 1 - 1 / 1

İsim:: license.txt
Boyut:: 1.17 KB
Biçim:: Item-specific license agreed upon to submission
Açıklama:

İndir

Koleksiyon

Tez Koleksiyonu