diff --git a/docs/index.html b/docs/index.html index ce6b89d..06af4dd 100644 --- a/docs/index.html +++ b/docs/index.html @@ -154,6 +154,9 @@

A Compositional Image Generation
Princeton University
+
+ * Equal Contribution +
[NeurIPS 2024 D&B]
@@ -206,6 +209,12 @@

A Compositional Image Generation Slides + + + + Poster + + @@ -412,20 +421,23 @@

Leaderboa + + +
-

LAION-5B Concept Diversity

+

Performance on Individual Concept Categories (k=1)

- The LAION-5B dataset is analyzed for concept diversity, the heatmap below showcases the frequency of these visual concepts in sampled captions. + We evaluate the performance of T2I models across different concept categories. Color and style are easier, with all models achieving high scores. Performance is lower for generating specific numbers of objects and spatial relationships, with varying results for texture and size. Overall, DALL·E 3 outperforms others in all categories.

- Concept Diversity in LAION-5B Dataset + Performance Across Concept Categories

- Concept Diversity in LAION-5B Dataset. Left: Heatmap of sampled captions shows colors and styles are most frequent; shapes and spatial relationships are least. Right: Most examples include 2-3 concepts. + Performance Across Concept Categories. We evaluate T2I models across concept categories, finding high scores for color and style but lower performance for object counts and spatial relationships. DALL·E 3 outperforms others across all categories.

@@ -438,16 +450,35 @@

LAION-5B Concept Diversity

-

Performance on Individual Concept Categories (k=1)

+

Performance of Compositional Generation (k > 1)

+
+ +

- We evaluate the performance of T2I models across different concept categories. Color and style are easier, with all models achieving high scores. Performance is lower for generating specific numbers of objects and spatial relationships, with varying results for texture and size. Overall, DALL·E 3 outperforms others in all categories. + ConceptMix Shows Stronger Discriminative Power: + We compare five models using 3-in-1 and GPT4v scores (global prompt-level) from T2I-CompBench, and ConceptMix with varying difficulty levels (k). ConceptMix, with varying difficulty levels (k), clearly distinguishes model performance, with gaps widening as k increases. +

+
+
+
+
+
+ +
+
+
+
+

Qualitative Performance of Different T2I Models

+
+

+ We compare the qualitative performance of different T2I models (SD v1.4, SD v2.1, PixArt alpha, Playground v2.5, DALL·E 3) across varying levels of compositional complexity (k = 1...7). As prompts become more complex, image quality degrades. DALL·E 3 performs best, while SD v1.4 performs worst.

- Performance Across Concept Categories + Qualitative Performance Comparison

- Performance Across Concept Categories. We evaluate T2I models across concept categories, finding high scores for color and style but lower performance for object counts and spatial relationships. DALL·E 3 outperforms others across all categories. + Qualitative Comparison: Visual comparison of generated images across different models and complexity levels (k), showing degrading performance with increasing prompt complexity.

@@ -456,17 +487,21 @@

Performance on Indivi

+
-

Performance of Compositional Generation (k > 1)

+

LAION-5B Concept Diversity

+

+ The LAION-5B dataset is analyzed for concept diversity, the heatmap below showcases the frequency of these visual concepts in sampled captions. +

- + Concept Diversity in LAION-5B Dataset

- ConceptMix Shows Stronger Discriminative Power: ConceptMix, with varying difficulty levels (k), clearly distinguishes model performance, with gaps widening as k increases. + Concept Diversity in LAION-5B Dataset. Left: Heatmap of sampled captions shows colors and styles are most frequent; shapes and spatial relationships are least. Right: Most examples include 2-3 concepts.