theally
about 1 year ago

Please see our Quickstart Guide to Stable Diffusion 3.5 for all the latest info!

Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer with improvements (MMDiT-x) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

Please note: This model is released under the Stability Community License. Visit Stability AI to learn or contact us for commercial licensing details.

Model Description

  • Developed by: Stability AI

  • Model type: MMDiT-X text-to-image generative model

  • Model Description: This model generates images based on text prompts. It is a Multimodal Diffusion Transformer (https://arxiv.org/abs/2403.03206) with improvements that use three fixed, pretrained text encoders, with QK-normalization to improve training stability, and dual attention blocks in the first 12 transformer layers.

License

  • Community License: Free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in total annual revenue. More details can be found in the Community License Agreement. Read more at https://stability.ai/license.

  • For individuals and organizations with annual revenue above $1M: please contact us to get an Enterprise License.

Implementation Details

  • MMDiT-X: Introduces self-attention modules in the first 13 layers of the transformer, enhancing multi-resolution generation and overall image coherence.

  • QK Normalization: Implements the QK normalization technique to improve training Stability.

  • Mixed-Resolution Training:

    • Progressive training stages: 256 → 512 → 768 → 1024 → 1440 resolution

    • The final stage included mixed-scale image training to boost multi-resolution generation performance

    • Extended positional embedding space to 384x384 (latent) at lower resolution stages

    • Employed random crop augmentation on positional embeddings to enhance transformer layer robustness across the entire range of mixed resolutions and aspect ratios. For example, given a 64x64 latent image, we add a randomly cropped 64x64 embedding from the 192x192 embedding space during training as the input to the x stream.

These enhancements collectively contribute to the model's improved performance in multi-resolution image generation, coherence, and adaptability across various text-to-image tasks.

  • Text Encoders:

  • Training Data and Strategy:

    This model was trained on a wide variety of data, including synthetic data and filtered publicly available data.

For more technical details of the original MMDiT architecture, please refer to the Research paper.

Usage & Limitations

  • While this model can handle long prompts, you may observe artifacts on the edge of generations when T5 tokens go over 256. Pay attention to the token limits when using this model in your workflow, and shortern prompts if artifacts becomes too obvious.

  • The medium model has a different training data distribution than the large model, so it may not respond to the same prompt similarly.

  • We recommended to sample with Skip Layer Guidance for better struture and anatomy coherency.

Read more...

What is Stable Diffusion 3.5 Medium?

Stable Diffusion 3.5 Medium is a highly specialized Image generation AI Model of type Safetensors / Checkpoint AI Model created by AI community user theally. Derived from the powerful Stable Diffusion (SD 3.5) model, Stable Diffusion 3.5 Medium has undergone an extensive fine-tuning process, leveraging the power of a dataset consisting of images generated by other AI models or user-contributed data. This fine-tuning process ensures that Stable Diffusion 3.5 Medium is capable of generating images that are highly relevant to the specific use-cases it was designed for, such as base model, stable diffusion, stability ai.

With a rating of 0 and over 0 ratings, Stable Diffusion 3.5 Medium is a popular choice among users for generating high-quality images from text prompts.

Can I download Stable Diffusion 3.5 Medium?

Yes! You can download the latest version of Stable Diffusion 3.5 Medium from here.

How to use Stable Diffusion 3.5 Medium?

To use Stable Diffusion 3.5 Medium, download the model checkpoint file and set up an UI for running Stable Diffusion models (for example, AUTOMATIC1111). Then, provide the model with a detailed text prompt to generate an image. Experiment with different prompts and settings to achieve the desired results. If this sounds a bit complicated, check out our initial guide to Stable Diffusion – it might be of help. And if you really want to dive deep into AI image generation and understand how set up AUTOMATIC1111 to use Safetensors / Checkpoint AI Models like Stable Diffusion 3.5 Medium, check out our crash course in AI image generation.

Download (1.85 KB) Download available on desktop only
You'll need to use a program like A1111 to run this – learn how in our crash course

Popularity

600 ~10

Info

Base model: SD 3.5

Latest version (Workflow): 1 File

To download these files, please visit this page from a desktop computer.

About this version: Workflow

Official Stability AI ComfyUI workflow for SD 3.5 Medium

2 Versions

😥 There are no Stable Diffusion 3.5 Medium Workflow prompts yet!

Go ahead and upload yours!

No results

Your query returned no results – please try removing some filters or trying a different term.