Tensor.Art
Create

[AIDv2.10] Anime Illust Diffusion / 动漫插画扩散

CHECKPOINT
Reprint


Updated:

370

Version Detail

SD 1.5
3000
Model Introduction (英文) 0 Foreword I grew tired of the monotonous faces, poses, and styles produced by traditional AI drawing systems, so I wanted to break away from the hybrid models. Initially, I used prompts, but I couldn't achieve the subtle lines, colors, lighting, textures, composition, or storytelling that I desired. I couldn't even replicate the stunning styles that the models occasionally produced by chance. These fleeting moments had slight differences from the general styles, but they were aesthetically captivating. Therefore, I wanted to create a model that could learn artistic styles perfectly and consistently produce outputs in those styles. I started collecting training data from November 2022 and used special tags to differentiate styles from dataset with only subtle differences. Eventually, in early 2023, I developed a distinct style for the model, which became known as the AIDv1.0 model. Why choose fine-tuning rather than lora? I have always believed that fine-tuning gives better results than Lora. It does not rely on a base model, and all training images progress together towards the point of lowest loss, rather than just optimizing an additional set of weights. However, I have also explored methods to seamlessly incorporate specific styles into large models to reduce the training burden. Over the next six months, I have spent over $2500. I collected and labeled all the training data, and modified training scripts to meet my needs. The training steps ranged from thousands to millions, and the training devices grows from RTX3060, 3090 to A100. AID gradually evolved into a complete engineering project. During this process, I found that the model learns the styles best when it slightly "overfits" to the noise. So, I attempted to overtrain all styles and used negative embeddings to learn the noise as a form of regularization, balancing the learning progress between different styles. This approach yielded good results for me. Well-tuned negative embeddings not only preserve the style of the base model but also enhance the features of the style. As the model iterated, I believe I reached the limit of Stable Diffusion 1.5. Even with fine-tuning, the model struggled to imitate the contour, colors, lighting, composition, and storytelling of those great styles. From underfitting to overfitting, I could never achieve perfect stylized features, especially considering the model's need to optimize for over a hundred artistic styles simultaneously. Therefore, I am eagerly anticipating more complex SDXL models to provide new solutions. During training, I didn't focus on writing complex prompts or testing different styles. Some of my friends achieved impressive results by combining Loras with highly complex prompts, and I am grateful for their innovation and support. Finally, thanks to @BananaCat for the localization of this introduction. I am happy to share my results with everyone. If you are interested in more engineering details of preprocessing training data or parameters, or are willing to share something with me, please leave a message in the comment area, and I will reply as soon as possible. I Introduction AnimeIllustDiffusion is a pre-trained, non-commercial and multi-styled anime illustration model. It DOES NOT generate "AI face". You can use some trigger words (see Appendix A) to generate specific styles of images. Due to plenty of contents, AID needs a lot of negative prompts to work properly. If you get noisy images (most case will be noisy) when generating, you need to use it with my negative text embedding [1] to cancel noise, which is crucial. Otherwise, you will get bad results. For VAE, I recommend sd-vae-ft-mse-original [5]. Part II of this introduction describes how the model was made; part III presents my proposed negative text embeddings; and Appendix A provides a partial list of keywords. Please carefully browse the version information before downloading!!! The model has over 100 stable anime illustration styles and 100 anime characters. See Appendix A for specific style trigger words. To generate a specific character, just use the character's name as prompt directly. The AID model is like a palette, and you can create new styles by combining different prompts. 1 Suggested Parameters Sampler: Euler a Steps: 32 Resolutions: 512x768, 640x690, 768x1152, etc. CLIP skip: 1 Prompts format: best quality, masterpiece, highres, by {xxx}, best lighting and shadow, stunning color, radiant tones, ultra-detailed, amazing illustration, an extremely delicate and beautiful, {other prompts} , where by {xxx} is the name of the style (trigger words in appendix A). Negative prompts format:aid210, {other negative prompts} , where aid210 is the special negative embedding which you can download and learn to use it from [1]. 2 Version Comparisons Each version of AID has its own strengths. The newer version is not absolutely better. For beginners: v2.8, v2.91 - Weak, v2.10beta1 Great creativity: v2.6, v2.7, v2.91 - Weak, v2.91 - Strong Relatively stable: v2.5, v2.6, v2.8, v2.91 - Weak Various styles: v2.91 - Weak, v2.91 - Strong, v2.10beta1 If you'd like to upload and share your own images, or would like to contribute training images for future AID models, please move to: anime-illust-diffusion-gallery - a Hugging Face Space by Eugeoter II Model This model is a fusion of three different models, two of which I trained and one is the Pretty 2.5D model fused by GoldSun [2]. 1 Model Training I use 4300+ artificially cropped, tagged, 512x512 size anime illustration images as the training set, and use dreambooth to fine-tune the Naifu 7G model. I trained for 100 epochs per training set image with a high learning rate. I didn't use regularized images. I also trained its text encoder. If interested, you can find detailed parameter information at [3]. 2 Model Merging I merged 3 models using Merge Block Weighted to create this AnimeIllustDiffusion model. Among the three models, one model is used to provide style and text encoder (base alpha and all OUT layers), one model is used to optimize hand details (IN layers 00 - 05), and another model (Pretty 2.5D [3]) are used to provide better composition (IN layers 06 - 11 and M00 layers). III Negative Text Embedding The model recommends using badv3 - a text embedding file of negative cue words. It not only simplifies the writing of prompt words, but also stimulates the potential of the model and improves the quality of generated images. Usually, the effect of badv3 is enough, and you don't need to fill in additional quality prompt words. But it doesn't solve 100% of the picture problems. 1 How to Use It You should place the downloaded negative text embedding file, the badv3.pt file, in the embeddings folder of your stable diffusion directory. After that, you just need to enter badv3 in the negative prompt word field. 2 Ideas on It My idea is to train a concept of bad images and put it into negative prompt to avoid generating such bad images. I trained a negative text embedding, badv3, using a few hundred bad images generated by the model, which works in a similar way to EasyNegative [4]. I tried training it to overfit to mitigate the effect of traditional negative text embeddings on the style of the model, and it seemed to work. Badv3 works better for this model than EasyNegative. I haven't compared other negative text embeddings yet. badv3 is the nth negative text embedding I trained after deformityv6. It's pretty easy to make, but the results are pretty random. I have tried removing weights from another model trained with bad images by adding differencing, but so far with no promising results. My next plan is to train Negative Lora instead of Negative Text Embeddings to directly "remove" some of the weights from the model rather than "avoid" them. IV Declarations This model is used to test multi-style model training, non-profit or commercial, all interest. If there is any infringement, it will be deleted immediately. All cover images were generated by text2image without using any Lora, using the negative text embedding at [1] in the negative prompts. Users are only authorized to use this model to generate pictures, and unauthorized reproduction is not allowed. Any commercial use of this model is strictly prohibited! The display picture in Appendix A is a large classification prompt word for the special label of this model, and it is for reference only. Please do not use this model to generate bloody, violent, pornographic images and any infringing content! Therefore, only part of the trained keywords can be provided in Appendix A. V 引用网页 / Referenced Pages

Project Permissions

Model reprinted from : https://civitai.com/models/16828?modelVersionId=116468

Reprinted models are for communication and learning purposes only, not for commercial use. Original authors can contact us to transfer the models through our Discord channel --- #claim-models.

Comments

Related Posts

No posts yet
Describe the image you want to generate, then press Enter to send.