🖥️Welcome to try out the open-source GPT4V-Image-Captioner, developed by my friend and me. It offers a one-click installation and comes integrated with multiple features including image pre-compression, image tagging, and tag statistics. Recently, we also launched the webui plugin version of this tool, everyone is welcome to use it!
🌍欢迎加入QQ群"兔狲的AIGC梦工厂",群号 :835297318(入群答案:兔狲)。Telegram群聊“兔狲的SDXL百老汇”链接:https://t.me/+KkflmfLTAdwzMzI1
This model is a run-accelerated version of the HelloWorld SDXL base model, incorporating both SDXL-Lightning technologies. Equipped with the Eular a sampler and CFG 1, it is capable of generating images in 6-8 steps, which is three times faster than the original SDXL version. Moreover, upon comparison, its imaging results are superior to those of LCM or Turbo versions.
The recommended parameters for generating images with this model are:
Sampler: Eular a (Important! The model is specifically adapted to Eular a, other samplers may not yield as good results)
CFG scale: 1
Sampling steps: 8 steps (6~8 steps are acceptable)
Hires algorithm: ESRGAN 4x / 8x_NMKD-Faces_160000_G
Hires Upscale factor: 1.5x
Hires steps: 8 steps
Hires Denoising strength: 0.3
HelloWorld 5.0 is the most substantial update in the history of the HelloWorld series, tagged with GPT-4v, and has undergone significant fine-tuning in fields such as science fiction, animals, architecture, and illustration.
Comparative tests show improvements in this version include:
1. More varied and dynamic character poses and image compositions, creating visually engaging pictures;
2. The film dataset has been extensively trained. While the film texture was weak from versions 2.0 to 4.0, many fans missed the leogirl style of version 1.0. Therefore, this update has specifically strengthened the film texture without compromising other photographic qualities. The film texture can be triggered by phrases such as film grain texture and analog photography aesthetic;
3. Enhanced expressiveness in themes like science fiction, thriller, and animals, with mechas and other subjects having a more designed feel. Animals like snow leopard, red panda, giant panda, tiger, the Pallas's cat, and domestic cats and dogs are more lifelike;
4. Thanks to GPT tagging, prompt adherence and conceptual accuracy have been further improved.
However, the drawbacks of this version include:
1. As this is a substantial fine-tuning update, the error rate for limbs and such may slightly increase, a normal phenomenon when moving out of a comfort zone into new areas of relative optimization. Previous versions underwent extensive limb testing for improvements, while the new version had limited time for such enhancements. Nevertheless, the accuracy of limbs in this version is at least higher than in version 1.0, and I will continue to make improvements in future updates.
2. Due to the reinforced film texture, even though GPT tagging is as accurate as possible, there can be an unavoidable default warm tone in images. However, you can use prompts like studio light or sharp focus to produce high-definition studio-quality images, and with proper use of prompts, the output can have better skin tones and visual appeal than previous versions.
3. This version includes more full-body character images to enhance the full-body effect, so the model may produce wider scenes than before if no specific character composition is directed. Currently, the facial details in 1024 resolution full-body shots might be less sharp compared to half-body or close-up shots. However, this can be improved by adetailer and a 1.5x Hires. fix at 0.3 intensity, or by using prompts like specifying composition to avoid generating full-body images.
4. Since a small number of high-quality illustration datasets have been added, there is a chance that prompts related to animated styles will produce animated images. If this concerns you, please adjust your prompts accordingly.
These are the main updates for this version. Training the SDXL base model is challenging, and when the training set approaches ten thousand images, the cost for tagging and training for each model exceeds 300 USD. I welcome everyone to use the model and appreciate any feedback you can provide! If you find this model satisfactory, I would be immensely grateful if you could help spread the word about it.
HelloWorld4.0 is a progressive transitional version from tagging with blip+clip to tagging with GPT4V. I initially trained a pure GPT4V tagging model, and then merged it with a large proportion of the HelloWorld3.2 version and 0.05 proportion of Juggernaut XL (to adjust the skin tone). The new version has shown improvements in prompt compliance and concept coverage compared to the 3.2 version.
The new GPT4V tagging training set has doubled from the 4000 images of the helloworld3 series to 8000 images, covering not only portraits but also animals, architecture, nature, food, illustrations, and more. However, the pure GPT4V version encountered an overfitting problem, which is preliminarily attributed to the doubling of the number of training images. One of the next steps in iterative optimization is to find out how to include as many non-portrait concepts as possible while ensuring sufficient training of portraits. At this stage, a fusion of the new and old versions has been used for fine-tuning to ensure a smooth transition between versions, so the expanded concept set and the advantages brought by GPT4V tagging are not very perceptible at the moment. These advantages will become increasingly apparent in the subsequent generations 5 and 6 of the model.
Version 3.2 is an iteration optimized with DPO technology, and compared to version 3.0, there are optimizations in skin tone and limb accuracy, but the improvements are not significant. That's why this version is marked as 3.2 rather than being labeled as 4.0.
The new version has expanded the training set, enhancing the model's ability to express in different artistic styles, including science fiction and art.
It has integrated a self-made quality enhancement LoCon (created using slider technology), to improve image texture and alleviate issues of distortion in fingers and limbs.
Thank you all for your patience. After overcoming various challenges, the HelloWorld 2.0 version is finally ready to be presented to you all in a state that I'm satisfied with. The main differences between HelloWorld 2.0 and 1.0 are as follows:
HelloWorld 2.0 no longer requires trigger words, and the results are comparable in quality to version 1.0 with trigger words.. The trigger word 'leogirl' in 1.0 was highly associated with East Asians. After the cancellation of the trigger words, while words like '1girl' will still likely generate East Asian portraits when race is not specified, you can now specify the race by using keywords like nationality, skin color, etc. For example, the trigger effects for words like 'Chinese', 'Russian', 'Iranian', 'Jamaican', 'Kenyan', 'dark-skinned', 'pale-skinned', etc., are listed below.
You can also get different styles of characters by writing the names of people from different countries and genders in the prompt, such as Han Meimei (China), Sophie Martin (France), Priya Patel (India), Fatima Al-Hassan (Arab), Wanjiru Mwangi (Kenya). The above prompts are just examples, there are many available prompts and ways to play, and you're welcome to explore and share them by yourself.
HelloWorld 2.0 has balanced the quality/color and offers more style options. The 1.0 version, when used with 'leogirl', would likely produce images with a strong film texture. HelloWorld 2.0 is no longer tied to a film texture and can be customized with some quality-related prompts. Some prompts that have been tested and work well include:
high-end fashion photoshoot, product introduction photo, popular Korean makeup, aegyo sal, Sharp High-Quality Photo, studio light, medium format photo, Mamiya photography, analog film, Medium Portrait with Soft Light, real-life image, refined editorial photograph, raw photo, real photo, Scanned Photo, film still
The color effects of these prompts are as follows:
The training set for HelloWorld 2.0 significantly increased the proportion of full-body photos to improve the effects of SDXL in generating full-body and distant view portraits. Although it has improved compared to version 1.0, it is still strongly recommended to use 'adetailer' in the process of generating full-body photos. Also, for users with enough video memory (24g), it is recommended to perform 1.5x high-resolution repair on the image, which can significantly improve facial details.
Special reminder: When using the HelloWorld 1.0 model, please remember to add the trigger word "leogirl".
Distinct from SD1.5 base model “MoonFilm”, “HelloWorld” is a brand new realistic SDXL base model series, . In order to allow more users to discover HelloWorld, I have retained the original Moonfilm's model link. It can be perceived as a spiritual continuation of Moonfilm on the SDXL new platform, but HelloWorld aims to achieve more than just the pursuit of realism and film-like quality in portraits. Thanks to the far superior amount of information and text understanding capabilities of SDXL compared to SD1.5, HelloWorld is a base model that seeks to realistically depict all things, or in other words, I hope to gradually build a virtual photography world using HelloWorld.
The realistic base model of SD1.5 has developed to a quite mature stage, and it is unlikely to have a significant performance improvement. Unless there is a breakthrough technology for SD1.5 platform, the Moonfilm & MoonMix series will basically stop updating. I will devote my main energy to the development of the HelloWorld SDXL large model. The 1.0 version is now available for download, and the 2.0 version is being developed urgently and is expected to be updated in early September.
As a brand new SDXL model, there are three differences between HelloWorld and traditional SD1.5 models:
Unlike SD1.5 base models, which typically do not include trigger words, please remember to use the trigger word "leogirl" when using HelloWorld 1.0. This ensures that the SDXL model triggers the training set effect more stably.
The HelloWorld model supports direct output at a resolution of 1024*1024 pixels, eliminating the need for high-resolution magnification. The quality of close-up portrait directly output is not inferior to the SD1.5 version, but there are still flaws when outputting distant portraits directly. Therefore, it is suggested to use ADetailer plugin, which can effectively correct the problems of distant faces.
SDXL now allows for easier output using simple natural language prompts. It is recommended to try more natural language prompts, which will result in better outcomes when outputting AI realistic photos.
After multiple rounds of testing, the suggested drawing parameter settings are:
Steps ≥ 25
Sampler: DPM++ 2M Karras
CFG scale: 10
Size ≥ 1024x1024
ADetailer: open
Everyone is welcome to try HelloWorld and provide plenty of feedback. Your valuable opinions are very important for the next step of model improvement!
The HelloWorld series of models (hereinafter "the Model") has been crafted by myself (hereinafter "the Owner") with the assistance of the LiblibAI platform. Republishing the Model on platforms excluding LiblibAI and Civitai is unauthorized by the Owner.
The Owner permits the use of images generated by the Model for non-commercial educational or informative purposes at no cost, on the condition that:
- Users adhere to applicable laws and do not violate the rights of the Model or any third-party.
- Attribution for the images must be clearly stated as "created by LEOSAM's HelloWorld base model".
For any form of commercial utilization, a prior commercial license agreement with the Owner is required. For inquiries related to commercial licensing and model personalization, please reach out to the Owner via the contact information available on the Owner's homepage.
The development and free distribution of the SDXL model represent significant endeavors. The Owner pledges ongoing complimentary updates to the HelloWorld model for individual enthusiasts as a token of appreciation for the community's contributions to open-source development. Collaborative commercial engagements are vital for the Model's advancement and refinement. The Owner appreciates every user for their understanding and support.
Unauthorized use may breach applicable laws and carry legal repercussions. The Owner retains exclusive rights to interpret this statement, which is governed by prevailing laws and regulations.
LEOSAM's HelloWorld XL is a highly specialized Image generation AI Model of type Safetensors / Checkpoint AI Model created by AI community user LEOSAM. Derived from the powerful Stable Diffusion (SDXL 1.0) model, LEOSAM's HelloWorld XL has undergone an extensive fine-tuning process, leveraging the power of a dataset consisting of images generated by other AI models or user-contributed data. This fine-tuning process ensures that LEOSAM's HelloWorld XL is capable of generating images that are highly relevant to the specific use-cases it was designed for, such as photorealistic, base model, photo.
With a rating of 4.89 and over 557 ratings, LEOSAM's HelloWorld XL is a popular choice among users for generating high-quality images from text prompts.
Yes! You can download the latest version of LEOSAM's HelloWorld XL from here.
To use LEOSAM's HelloWorld XL, download the model checkpoint file and set up an UI for running Stable Diffusion models (for example, AUTOMATIC1111). Then, provide the model with a detailed text prompt to generate an image. Experiment with different prompts and settings to achieve the desired results. If this sounds a bit complicated, check out our initial guide to Stable Diffusion – it might be of help. And if you really want to dive deep into AI image generation and understand how set up AUTOMATIC1111 to use Safetensors / Checkpoint AI Models like LEOSAM's HelloWorld XL, check out our crash course in AI image generation.
Many improvements and attempts have been made in the production process of this version. The main improvements are listed one by one for reference:
Further selections were made to the training materials, but the total volume is still maintained at the scale of 500 training sets + 1500 regular sets. The proportion of full-body photos, male photos, high-definition texture photos, and photos of different races has been increased.
The word library used for clip labeling originally contains about 110,000 phrases, but there are a large number of errors, garbled codes, and repeated phrases. With the help of GPT4 batch modifications and multiple rounds of test labeling and manual addition and deletion, this word library has been reduced to 40,000 words, and a large number of phrases related to photography, portraits, and China have been added.
A large number of comparative tests have been conducted. Including a. The difference between training the SDXL under the same training set with dreambooth and first training with SDXL lora and then merging into the large model; b. The training effect differences under adafactor, adamW8bit, prodigy three optimizers, different LR schedulers, different learning rates, different batch sizes; c. The effect differences when different training set data enhancement methods are used and not used; d. The training effect under different SDXL base models.
Batch image processing was performed before the training set was bucketed, compressing and cropping the training set and putting it into the target resolution groups of (768, 1360),(832, 1248),(864, 1184),(1024, 1024),(1184, 864),(1248, 832),(1360, 768). This improves the subsequent large batch size training effect (but the improvement seems limited).
Above are the main updates for HelloWorld 2.0. There were quite a few challenges when updating this version, but the good news is that I've figured out the way to train the SDXL large model, so future updates should be much smoother.
该版本在制作过程中在多方面进行了改进尝试。主要改进逐一列举如下:
对训练素材进行了进一步的增减精选,但总量仍维持500训练集+1500正则集的规模。增大了全身照、男性照、高清质感照片以及不同人种照片的比例。
clip打标所用的词库本身有约11万词组,但其中存在大量错误、乱码与重复词组。借助GPT4批量修改,以及多轮测试性打标人工增减,将该词库缩减至4万词规模,并大量补充了与摄影、人像、中国相关的词组。
进行了大量的对比测试。包括a.同训练集下dreambooth训练SDXL大模型与先SDXL lora训练再合并入大模型的效果差异;b.adafactor、adamW8bit、prodigy三个优化器、不同LR scheduler、不同学习率、不同batch size下的训练效果差异;c.不同训练集数据增强方法使用与未使用时的效果差异;d.不同sdxl底模下的训练效果。
对训练集进行了分桶前的批量图像处理,将训练集压缩裁剪并归入 (768, 1360),(832, 1248),(864, 1184),(1024, 1024),(1184, 864),(1248, 832),(1360, 768)这7个目标分辨率组别。以提高后续大batch size下的训练效果(但感觉提升有限)。
以上就是HelloWorld 2.0版本的主要更新内容,这个版本在更新过程中属实踩了太多坑,好处是摸到了sdxl大模型训练的门道,以后的更新应该会顺利很多。
Go ahead and upload yours!
Your query returned no results – please try removing some filters or trying a different term.