Model Information






Description
====================================
There are 2 models:
DRm: DryRender-mae Anime Style
SRm: SemiReal-mae
====================================
Fine-tuned vanilla SD1.5 that trained to mimic my art style [My Instagram: https://www.instagram.com/ray_vietii ]
I did UNet of effective_total_steps = ( 2500 steps + ( 500 dataset_image_count )) iteration count ) to make the model understood what the style is without specifying some sort of "trigger word". And then i merge my LoRa which is based on the same art style,,my art style, but my LoRa has already learned the context and tags and or text_encoder_lr.
What exactly the recipe is?
What i did is making 2 variants, base1 is having high unet_lr (6e-5)+LoRa, and base2 is lower unet_lr (2e-5)+LoRa, and then base0 is another 2e-5 without LoRa.
And next thing i did was merging these bases: base1[0.4] + base2[0.6] = base1+2.
And then base1+2 [0.8] + base0[0.2], and so on, which represents as "iterations".
I did that recipe with slightly different and keep merging the variants to itself. With that, despite only have X images training, it now have pretty much broad variations.
Just like any other basemodel, it's cohesive and stable, no more SD1.5 vanilla leaking, just pure style distillation, and Mean Average Emergent or " MaE is my method, which allows me to "sculpt" the base vanilla SD1.5 with only 43 image of dataset(as of iteration 1).
It is expected to be murky and muddy because of this method, but with the right prompting, it will generate some decent good images.
MaE is way much efficient since if you're having bare minimum dataset.
Additional details that is not exactly relevant, because this "research" is done for LoRa that being used within my models:
DDPM scheduler setting comparison:
RayVietii-DryRender:
βs = 0.0 | βe = 0.0095
Kohya_ss default setting:
βs = 0.00085 | βe = 0.012
The Empirical Standard Value:
βs = 0.0001 | βe = 0.02
Judging from the formula, Xt = βs...βt = X0 (simplified), (t1, t2, ..., tn )
> While it may seem complex at first, the process is actually quite straightforward. Here, Xₜ represents the image at timestep t , and Xₜ₋₁ represents the image at the previous timestep. ϵis our randomly generated unit Gaussian noise. Since it is a unit Gaussian, its variance is one. When we multiply it by the term square root of βt, its variance becomes βt. We also scale down Xₜ₋₁ by square root of 1-βt to ensure that the variance of Xₜ does not grow when we add noise. This is essentially a balancing term.
The βt parameter controls the amount of Gaussian noise added to the image. The authors call this the variance schedule, which ramps up at higher values of t. In the original work by [Ho et al. (2020)](https://arxiv.org/abs/2006.11239), betas are put in a linear space from β1=0.0001 to βT=0.02 with T=1000 diffusion steps. They are relatively small compared to the normalized image pixel values between [−1,1].
What does this mean is, it started with no denoising at all, keeping the dataset image as is, which means X1 = X0 (original image). βe = 0.0095, this value is significantly smaller then the recommended 0.02, which means, with 0.0095 β, the dataset image is never truly become a pure noise, which give us an answer of why it's smart at predicting, because this model have no hallucination (pure noise) to start off during reverse process (denoising).
RayVietii_DryRender-Diabolical_Diffusion
RayVietii-SRm3.2-Inf
Model Details
- Type
- AI Model
- Task
- text-to-image
- Subtype
- Safetensors / Checkpoint AI Model
- Created
- Updated
- June 6, 2026