(tl/dr: Works exactly as it should without flaws you might encounter in other checkpoints.)
Easy and convenient prompting
Great aesthetic, anatomy, stability along with versatility
Vibrant colors and smooth gradients without trace of burning
Full brightness range even with epsilon
22k+ artist styles, many general styles, almost any character
No more annoying watermarks
No characters bleed and related side effects (unwanted outfits, style, composition changes)
No spawning of strange creatures, sfx on background or extra pair of breasts (1, 2)
Better coherence (1, 2), prompt following, anatomy (significant boost over illustrious, slight or neglectable over noob)
Artist styles look exactly as they should (and lots of new added)
Better prompt following without ignoring tags and need of (higher weights:1.4)
Forget about long scizo-negative
Stable style without random fluctuations on different seeds
New characters
Large well balanced dataset of 4.5M pictures (0.8M with natural text captions) picked from over 12M of different arts, significantly reworked TE and parts of UNET, innovative training approaches. All this in combination with great base model (despite variety of problems illustrious is currently the best base for anime) made it possible to create a checkpoint that would meet modern demands and show unique results.
Dataset cut-off - September 2024.
It works good both with short-simple and long-complex prompts. However, if there are contradictory or weird tags and concepts - they won't be ignored affecting the output. No guide-rails, no safeguards, no lobotomy, consider pruning scizo-prompts.
Dataset contains only booru-style tags and (simplified) natural text expressions. Despite having a share of furries, all captions have been converted to classic booru style to avoid a number of problems that may arise when mixing different systems. So e621 tags won't be understanded properly.
~1 megapixel for txt2img, any AR with resolution multiple of 64 (1024x1024, 1152x, 1216x832,...). Euler_a, CFG 4..9 (5-7 is best), 20..28steps. Sigmas multiply may improve results a bit, LCM/PCM and exotic samplers untested. Highresfix - x1.5 latent + denoise 0.6 or any gan + denoise 0.3..0.55.
Only 4 quality tags:
masterpiece, best quality, low quality, worst qualityNothing else. Meta tags like lowres have been removed, do not use them. Low resolution images have been either removed or upscaled and cleaned with DAT depending on their importance.
worst quality, low quality, watermarkThat's all, no need of "rusty trombone", "farting on prey" and others. Do not put tags like greyscale, monochrome in negative unless you understand what are you doing. It will lead to burning and over-saturation, colors are fine out of box.
Grids with examples, list (also can be found in "training data").
Used with "by " it's mandatory. Multiple give very interesting results, can be controlled with prompt weights.
2.5d, anime screencap, bold line, sketch, cgi, digital painting, flat colors, smooth shading, minimalistic, ink style, oil style, pastel style1950s (style), 1960s (style), 1970s (style), 1980s (style), 1990s (style), 1990s (style), animification, art nouveau, pinup (style), toon (style), western comics (style), nihonga, shikishi, minimalism, fine art parodyand everything from this group.
Can be used in combinations (with artists too), with weights, both in positive and negative prompts.
Use full name booru tag and proper formatting, like "karin_(blue_archive)" -> "karin \(blue_archive\)", use skin tags for better reproducing, like "karin \(bunny \(blue_archive\)". Autocomplete extension might be very useful.
Use it in combination with booru tags, works great. Use only natural text after typing styles and quality tags. Use just booru tags and forget about it, it's all up to you.
Dataset contains over 800k of pitures with hybrid natural-text captions made by Opus-Vision, GPT-4o and ToriiGate
tail censor, holding own tail, hugging own tail, holding another's tail, tail grab, tail raised, tail down, ears down, hand on own ear, tail around own leg, tail around penis, tail through clothes, tail under clothes, lifted by tail, tail biting, tail insertion, tail masturbation, holding with tail, ...(booru meaning, not e621) and many others with natural text. The majority works perfectly, some requires rolling.
You can use extra meta tags to control it:
low brightness, high brightness, low gamma, high gamma, sharp colors, soft colors, hdr, sdr, limited rangeThey work both in epsilon and vpred version and works really good.
Unfortunately here is an issue - the model relies on them too much. Without low brightness or low gamma or limited range (in negative) it might be difficult to achieve true 0,0,0 black, the same often true for white.
Both epsilon and vpred versions have like true zsnr, full range of colors and brightness without common flaws observed. But they behaves differently, just try it.
It is experimental. There is something wrong with token padding (probably) in vpred version, either with the model or on inference side. If you got broken washed out pictures like this - put BREAK somewhere on prompt. This is not happening on dark or bright pictures, to be investigated. Or just use epsilon version, it already provides full range and great experience.
Otherwise at the moment of release this is porbably the only vpred model that runs okay and doesn't suffer from burned colors, limited range, need of extra tweaks, rescales, adjustments and so on (default parameters: 1, 2, cfg rescale: 1, 2, 3). It even tends to have same like NAI3 behaviour with wrong skin colors and large fillups with red/yellow/blue under specific prompts. Full experience lmao.
To launch vpred version you will need dev build of A1111, comfy (with special loader node) or Reforge. Just use same parameters (Euler a, cfg 5..7, 20..28 steps) like epsilon. Cfg rescale is not mandatory but you can try it and choose if you like the results.
As was mentioned above to get full black or full white fill you will need to write a prompt longer than a single tag or use brightness meta-tags.
Off course there are:
As mentioned, model relies too much on brightness meta tags, so you'll have to use them to get full performance
Vpred version has problems with chunks padding or smth else, solved with BREAK
Inferior in furry-related knowledge compared to NoobAi
Some cherry-picked character datasets have prompting issues - Yozora and few cute fox-girls are not consistent
A little small details polishing finetune or lora would be nice, it's up to community
To be discovered
Same as illustrious. Fell free to use in your merges, finetunes, ets. just please leave a link.
I'll consider to make a report or something like it later.
In short, 98% of work is related to dataset preparations. Instead of blindly relying on loss-weighting based on tag frequency from nai paper, a custom guided loss-weighting implementation along with asynchronous collator for balancing have been used. Ztsnr (or close to it) with Epsilon prediction was achieved using noise scheduler augmentation.
First of all I'd like to acknowledge everyone who supports open source, develops in improves code. Thanks to the authors of illustrious for releasing model, thank to NoobAI team for being pioneers in open finetuning of such a scale, sharing experience, raising and solving issues that previously went unnoticed.
Artists wish to remain anonymous for sharing private works; Soviet Cat - GPU sponsoring; Sv1. - llm access, captioning, code; K. - training code; Bakariso - datasets, testing, advices, insides; NeuroSenko - donations, testing, code; T.,[] - datasets, testing, advices; rred, dga, Fi., ello - donations; other fellow brothers that helped. Love you so much ❤️.
And off course everyone who made feedback and requests, it's really valuable.
If I forgot to mention anyone, please notify.
If you want to support - share my models, leave feedback, make a cute picture with kemonomimi-girl. And of course, support original artists.
AI is my hobby, I'm spending money on it and not begging for donations. However, it has turned into a large-scale and expensive undertaking. Consider to support to accelerate new training and researches.
(Just keep in mind that I can waste it on alcohol or cosplay girls)
BTC: bc1qwv83ggq8rvv07uk6dv4njs0j3yygj3aax4wg6c
ETH/USDT(e): 0x04C8a749F49aE8a56CB84cF0C99CD9E92eDB17db
if you can offer gpu-time (a100+) - PM.
RouWei is a highly specialized Image generation AI Model of type Safetensors / Checkpoint AI Model created by AI community user Minthybasis. Derived from the powerful Stable Diffusion (Illustrious) model, RouWei has undergone an extensive fine-tuning process, leveraging the power of a dataset consisting of images generated by other AI models or user-contributed data. This fine-tuning process ensures that RouWei is capable of generating images that are highly relevant to the specific use-cases it was designed for, such as anime, base model.
With a rating of 0 and over 0 ratings, RouWei is a popular choice among users for generating high-quality images from text prompts.
Yes! You can download the latest version of RouWei from here.
To use RouWei, download the model checkpoint file and set up an UI for running Stable Diffusion models (for example, AUTOMATIC1111). Then, provide the model with a detailed text prompt to generate an image. Experiment with different prompts and settings to achieve the desired results. If this sounds a bit complicated, check out our initial guide to Stable Diffusion – it might be of help. And if you really want to dive deep into AI image generation and understand how set up AUTOMATIC1111 to use Safetensors / Checkpoint AI Models like RouWei, check out our crash course in AI image generation.
Major update
Go ahead and upload yours!
Your query returned no results – please try removing some filters or trying a different term.