Kirazuri (Anima) - 3.0

Kirazuri (Anima)

CHECKPOINT
Original


Updated:

Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art
Kirazuri (Anima) by motimalu on Tensor.Art

Kirazuri (Anima)

Version 3.0 (Latest)

For in-depth details of version 3.0 training and tooling, see: Kirazuri (Anima) 3.0 Training Diary

Training Details Summary

Trainer: diffusion-pipe commit b0aa4f1e03169f3280c8518d37570a448420f8be

Training device: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition

Total training time: ~10 days

Total samples seen(unbatched steps): ~2,550,000

Training resolutions:

  • 512^2

  • 768^2

  • 1024^2

  • 1280^2

  • 1536^2

Stage 1

  • Samples seen(unbatched steps): ~2,000,000

  • Training time: ~125 hrs

  • Learning Rate: 6e-6

  • Learning Rate Scheduler: Cosine

  • LLM Adaptor Learning Rate: 8e-7

  • Precision: Mixed BF16

  • Optimizer: AdamW8bit with Kahan Summation

  • Weight Decay: 0.01

  • Timestep Sampling Strategy: Logit-Normal

Stage 2

  • Samples seen(unbatched steps): ~550,000

  • Training time: ~118 hrs

  • Learning Rate: 3e-6

  • Learning Rate Scheduler: Cosine

  • LLM Adaptor Learning Rate: 0

  • Flux Shift: Enabled

  • Multi-Scale Loss Weight: 0.5

  • Precision: Mixed BF16

  • Optimizer: AdamW8bit with Kahan Summation

  • Weight Decay: 0.01

  • Timestep Sampling Strategy: Logit-Normal

Additional Features

  • Tag Dropout: 30% with protected first 8 tags

  • Tag Shuffle: Applied to last unprotected tags

  • Natural Language: Short and Long Caption variants

Changes from Kirazuri (Anima) v2.0

  • Dataset includes recently curated 7,071 images increasing total size from 35,537 to 42,608 images

  • Dataset cutoff now of 2026/05/12.

  • Trained at 5 total resolutions in two-stage training

    • Stage 1 - 512^2, 768^2, 1024^2

    • Stage 2 - 1024^2, 1280^2 1536^2

  • Introduced cosine learning rate scheduler for smooth learning rate transition between training stages

  • Re-captioned full dataset for a second natural language captions variant with updated captioning script

Recognitions

  • Thanks to Circlestone Labs for the Anima Preview base model.

  • Thanks to tdrussell of Circlestone Labs for the diffusion-pipe trainer.

  • Thanks to bluvoll for support using their fork of diffusion-pipe.

  • Thanks to narugo1992 and the deepghs team for open-sourcing various training sets, image processing tools, and models.

License

This model is released under the same license as the base model.

See the base model for details of the CircleStone Labs Non-Commercial License.

Version Detail

Anima
Version 3.0 (Latest) For in-depth details of version 3.0 training and tooling, see: Kirazuri (Anima) 3.0 Training Diary: https://github.com/motimalu/diffusion-training-configs/blob/main/diffusion-pipe/anima/notes/kirazuri3.0-notes.md Training Details Summary Trainer: diffusion-pipe commit b0aa4f1e03169f3280c8518d37570a448420f8be Training device: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition Total training time: ~10 days Total samples seen(unbatched steps): ~2,550,000 Training resolutions: 512^2 768^2 1024^2 1280^2 1536^2 Stage 1 Samples seen(unbatched steps): ~2,000,000 Training time: ~125 hrs Learning Rate: 6e-6 Learning Rate Scheduler: Cosine LLM Adaptor Learning Rate: 8e-7 Precision: Mixed BF16 Optimizer: AdamW8bit with Kahan Summation Weight Decay: 0.01 Timestep Sampling Strategy: Logit-Normal Stage 2 Samples seen(unbatched steps): ~550,000 Training time: ~118 hrs Learning Rate: 3e-6 Learning Rate Scheduler: Cosine LLM Adaptor Learning Rate: 0 Flux Shift: Enabled Multi-Scale Loss Weight: 0.5 Precision: Mixed BF16 Optimizer: AdamW8bit with Kahan Summation Weight Decay: 0.01 Timestep Sampling Strategy: Logit-Normal Additional Features Tag Dropout: 30% with protected first 8 tags Tag Shuffle: Applied to last unprotected tags Natural Language: Short and Long Caption variants Changes from Kirazuri (Anima) v2.0 Dataset includes recently curated 7,071 images increasing total size from 35,537 to 42,608 images Dataset cutoff now of 2026/05/12. Trained at 5 total resolutions in two-stage training Stage 1 - 512^2, 768^2, 1024^2 Stage 2 - 1024^2, 1280^2 1536^2 Introduced cosine learning rate scheduler for smooth learning rate transition between training stages Re-captioned full dataset for a second natural language captions variant with updated captioning script

Project Permissions

    Use Permissions

  • Use in TENSOR Online

  • As a online training base model on TENSOR

  • Use without crediting me

  • Share merges of this model

  • Use different permissions on merges

    Commercial Use

  • Sell generated contents

  • Use on generation services

  • Sell this model or merges

Related Posts