SDXL-Simulacrum-V22β - [SFW/NSFW] - Image 1

SDXL-Simulacrum-V22β - [SFW/NSFW] - Image 2

SDXL-Simulacrum-V22β - [SFW/NSFW] - Image 3

SDXL-Simulacrum-V22β - [SFW/NSFW] - Image 4

SDXL-Simulacrum-V22β - [SFW/NSFW] - Image 5

Description

SDXL-Simulacrum-V22β

-> This one is a teaser for full v3β, hope you all like it.

THIS MODEL is essentially an unabridged model. You can stray to the naughty place if you want, or you can make really cool shit like I did. Take your pick.

Starting to see the full power of the model cut through now like butter. The NSFW is a bit under wraps for now, but be wary.

This model is designed specifically with the world in mind. Everything, everyone, all makes, all creeds, all objects, all entities, all beings, and whatever else. I really don't care, as long as the AI can automatically caption it, then it goes in.

I taught the CLIP_L 3d environments, 3d depictions, and more. Flux taught it timestep progression through slow trickle learning with about 10 million samples, and the battle with CLIP_G has been legendary.

This simulates the world, it doesn't just guestimate a depiction or a still image offset location. It UNDERSTANDS the environments. It KNOWS what it's looking at, at which angle, and has a good idea how to timestep from point A to B.

This version was trained specifically with higher base timesteps than 0 and issued my current entire 3d dataset for a single epoch -> 70k 3d images -> in a non broken up fashion with a large portion of original dataset images totaling about a 120k image epoch.

The outcome is actually insane looking. I had very few expectations going into this. I expected it to fail, but here it is. A fantastic outcome. Enjoy.

The negative prompt requirement has shrunk a bit, but beware there's a lot of untested territory.

The background objects, seats, tables, and everything are solidifying the experience in ways I could have never predicted. This is quite the treat.

On a related note, I'm pretty sure it learned what a dildo is finally, so those should stop showing up as often.

This CLIP_L understands more plain English than you'd think, as it was a sparring partner with the T5 for a long time; however it also understands the importance of context switching due to me teaching it attention grids. The large array of tags sees to the rest of things, usually substituting at least some information somewhere, for what you're expecting to see or want. It has things like trash cans, bicycles, or asses. Take your pick.

Good luck playing with it.

I'll most likely release another 50 epochs on much more data by next week, barring hardware issues.

Generation Recommendations:
DPM-2M-SDE
-> BETA / KARRAS
-> Steps 14-50 -> 50
-> CFG 4.5-8.5 -> 6.5

DPM-2S-Ancestral
-> BETA / KARRAS
-> Steps 32
-> CFG 5 - 8 -> 6

DPM-2M
-> BETA / KARRAS
-> Steps 20-40 -> 40
-> CFG 7 -> 7

Euler doesn't work very well.

PROMPT BASICS HERE

<CAPTIONS HERE>

good aesthetic, very aesthetic, most aesthetic, masterpiece,
anime, 
<CHARACTERS HERE>

<ACTION CAPTIONS HERE>

<OFFSETS AND GRID GO HERE>

<CHARACTER TRAITS HERE>

highres, absurdres, newest, 2010s

Try not to breach 75 tokens for this version. The CLIP_L has been trained with 225 but they definitely aren't smart enough yet.

This helps make most images better.

good aesthetic, very aesthetic, most aesthetic, masterpiece,

TLDR: Use this NEGATIVE PROMPT to get started.

lowres,
nsfw, explicit, questionable, 
displeasing, very displeasing, disgusting, 

text, size_f text, size_h text, size_q text,
censored, censor bar,
monochrome, greyscale, 
bad anatomy, ai-generated, ai generated, jewelry,

watermark, 
hand, 
blurry hand,
bad hands, missing digit, extra digit, 
extra arm, missing arm, 
convenient arm, convenient leg, 
arm over shoulder, 
synthetic_woman,

Barebones negative: use at your own peril.

lowres, 
displeasing, very displeasing, disgusting, 

text, 
monochrome, greyscale, comic, 
synthetic_woman,

Currently the bad hands tag is entirely overwhelmed by GOOD HANDS, which is why it's not working. I did some calculations and saw that there is an exponentially higher amount of identified tag positions as good hands, than there are in terms of bad hands; so negative prompting simply starts burning good hands.

I'll need a new solution for this, as the cross contamination is becoming an issue.

For the depiction offset tag guide see here

Trained primarily in the scope of 1024x1024 with bucketing, the images can stretch up to 1300x1300 or so, but they fall apart given a bit of pressure so don't rely on that size too much.

There were multiple epochs burned in at 1024x1024 entirely devoted to teaching square, and then 1216x832, and 832x1216 for those relational offsets; so using grids are more reliable in those sizes.

1216x832, 832x1216, 1024x1024, 832x832, 768x768, 512x768, 768x512, will likely work.

If you want MORE reliable offsets use one of the 3 primary sizes;

1024x1024, 1216x832, 832x1216

As of V2β - there is a total of 5.5~ million samples trained using a very complex series of guesses, mental work, and carefully planned formulas based on learning experiences.

SDXL training still has A LOT of training to go, so it's in the very early stages of pre-v1, keep that in mind when going into this experience here.

It currently has some pretty shitty hands, I'll need to work something out to fix them, but as it stands they are pretty bad. I recommend using a hand fixing lora or adetailer, if it can even find the mangled hand.

It ran 25 epochs on the first 50k images in about 5k image segments, then it ran an additional 29 on the 115k image dataset in about 15k image segments.

The dataset is nearly 300,000 images total but they weren't all taught at once, and the model was trained using a similar curriculum training pattern as Flux-SCHNELL, except this time the real focus wasn't the model; it was the actual stars of the show.

The primary training consists of 10 layer combination of;

CLIP_L_OMEGAβ - Trained with nearly 22.5 million samples from multiple diffusion based models and stand-alone training.β

CLIP_L has been specifically tuned to understand highly complex phrases and fixates entirely on sections of the screen for very specific and very intricate utility when paired with CLIP_G.

CLIP_G_OMEGAβ - Trained with 5 million overlapping re-imposed reburned samples normalized and dot combined.

CLIP_G has been specifically been frozen for 20 epochs of the core 50,000 Simulacrum image combination pack using a multitude of loras; so CLIP_L could learn from it. Afterword, all of the CLIP_G changes were merged into frozen in a dot normalization interpolation fashion from original to final based on a simple learning curve. After, it was then taught everything depiction-offset related so it could properly teach CLIP_L how to behave.
The combination is an ongoing experiment with high yield results.

1 Full Finetune

3 LOKR

2 LOHA

5 LyCoRis

Each was trained by me, I am merging no out of scope LORAS with the base mix. I have merged since the finetune however.

Finally after both clips were acclimated to SDXL the pictures started to make sense. So far, the outcome is a bit unpredictable; as SIMV4_CLIP_L was tuned directly to Flux1D and then Flux1S, it's hard to say how the outcome will behave; but it seems that they behave with anything Base SDXL related; which means they haven't deviated too far from the course as to be unreachable like other CLIP_L and CLIP_G models; aka Illustrious or Pony.

I have tested OMEGA_L and OMEGA_G with SD35 and had fair results with more complex prompts.

SDXL-Simulacrum-V22β - [SFW/NSFW]

v21β

text-to-image Safetensors / Checkpoint AI Model

300downloads

Download Model

Model Details

Type: AI Model
Task: text-to-image
Subtype: Safetensors / Checkpoint AI Model
Created: Jan 26, 2025
Updated: July 21, 2026

Available Files

sdxlSimulacrumV22SFW_v21.safetensors

Description

CLIP_L_OMEGAβ - Trained with nearly 22.5 million samples from multiple diffusion based models and stand-alone training.β

CLIP_G_OMEGAβ - Trained with 5 million overlapping re-imposed reburned samples normalized and dot combined.

SDXL-Simulacrum-V22β - [SFW/NSFW]

Model Details

Available Files

Tags

Versions

Related Models

Model Information

Description

CLIP_L_OMEGAβ - Trained with nearly 22.5 million samples from multiple diffusion based models and stand-alone training.β

CLIP_G_OMEGAβ - Trained with 5 million overlapping re-imposed reburned samples normalized and dot combined.

SDXL-Simulacrum-V22β - [SFW/NSFW]

Model Details

Available Files

Tags

Versions

Related Models