Tensor.Art
Create

FREE online image generator and model hosting site!

Grab your 3-days PRO!

Creation

Get start with Stable Diffusion!
💥 SD3 & DiT

ComfyFlow

ComfyUI's amazing experience!
🎭 TAttoo Event

Host My Model

Share my models,get more attention!
💸 Double Earnings

Online Training

Make LoRA Training easier!
🤖 Make Fun

AI Tools

710495188562305388
730
26

Paint your Easter Egg!

722313667236059339
9K
386

💥​FILTER 1💥

706456321561617119
191
64

Ai Render

Articles

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

📝 - Synthical The Dynamics of Negative Prompts in AI: A Comprehensive Study by: Yuanhao Ban UCLA, Ruochen Wang UCLA, Tianyi Zhou UMD, Minhao Cheng PSU, Boqing Gong, Cho-Jui Hsieh UCLAE This study addresses the gap in understanding the impact of negative prompts in AI diffusion models. By focusing on the dynamics of diffusion steps, the research aims to answer the question: "When and how do negative prompts take effect?". The investigation categorizes the mechanism of negative prompts into two primary tasks: noun-based removal and adjective-based alteration. The role of prompts in AI diffusion models is crucial for guiding the generation process. Negative prompts, which instruct the model to avoid generating certain features, have been less studied compared to their positive counterparts. This study provides a detailed analysis of negative prompts, identifying the critical steps at which they begin to influence the image generation process. Findings Critical Steps for Negative Prompts Noun-Based Removal: The influence of noun-based negative prompts peaks at the 5th diffusion step. At this critical step, negative prompts initially generate a target object at a specific location within the image. This neutralizes the positive noise through a subtractive process, effectively erasing the object. However, introducing a negative prompt in the early stages paradoxically results in the generation of the specified object. Therefore, the optimal timing for introducing these prompts is after the critical step. Adjective-Based Alteration: The influence of adjective-based negative prompts peaks around the 10th diffusion step. During the initial stages, the absence of the object leads to a subdued response. Between the 5th and 10th steps, as the object becomes clearer, the negative prompt accurately focuses on the intended area and maintains its influence. Cross-Attention Dynamics At the peak around the 5th step for noun-based prompts, the negative prompt attempts to generate objects in the middle of the image, regardless of the positive prompt's context. As this process approaches its peak, the negative prompt begins to assimilate layout cues from its positive counterpart, trying to remove the object. This represents the zenith of its influence. For adjective-based prompts, during the peak around the 10th step, the negative prompt maintains its influence on the intended area, accurately targeting the object as it becomes clear. The study highlights the paradoxical effect of introducing negative prompts in the early stages of diffusion, leading to the unintended generation of the specified object. This finding suggests that the timing of negative prompt introduction is crucial for achieving the desired outcome. Reverse Activation Phenomenon A significant phenomenon observed in the study is Reverse Activation. This occurs when a negative prompt, introduced early in the diffusion process, unexpectedly leads to the generation of the specified object within the context of that negative prompt. To explain this, researchers borrowed the concept of the energy function from Energy-Based Models to represent data distribution. Real-world distributions often feature elements like clear blue skies or uniform backgrounds, alongside distinct objects such as the Eiffel Tower. These elements typically possess low energy scores, making the model inclined to generate them. The energy function is designed to assign lower energy levels to more 'likely' or 'natural' images according to the model’s training data, and higher energy levels to less likely ones. A positive difference indicates that the presence of the negative prompt effectively induces the inclusion of this component in the positive noise. The presence of a negative prompt promotes the formation of the object within the positive noise. Without the negative prompt, implicit guidance is insufficient to generate the intended object. The application of a negative prompt intensifies the distribution guidance towards the object, preventing it from materializing. As a result, negative prompts typically do not attend to the correct place until step 5, well after the application of positive prompts. The use of negative prompts in the initial steps can significantly skew the diffusion process, potentially altering the background. Conclusions Do not step less than 10th times, going beyond 25th times does not make the difference for negative prompting. Negative prompts could enhance your positive prompts, depending on how well the model and LoRA have learn their keywords, so they could be understood as an extension of their counterparts. Weighting-up negative keywords may cause reverse activation, breaking up your image, try keeping the ratio influence of all your LoRAs and models equals. Reference https://synthical.com/article/Understanding-the-Impact-of-Negative-Prompts%3A-When-and-How-Do-They-Take-Effect%3F-171ebba1-5ca7-410e-8cf9-c8b8c98d37b6?
What exactly are the "node" and the "workflow" in AI image platform (explanation for the beginner)

What exactly are the "node" and the "workflow" in AI image platform (explanation for the beginner)

The Traditional Way of Generating AI Images for the Beginner If you are a beginner in the AI community, maybe you will be very confused and have no clue about what is "Node", and "Workflow" and their relations with "AI Tools" in the TensorArt To start with the most simple way. We need to first mention how the user generates an image using a "Remixing" button that brings us to the "Normal Creation menu" Needless to say, by just editing the prompt (what you would like to see your picture look like) and negative prompt (what you do not want to see in the output image). Then push the Generate button, and the wonderful AI tool will kindly draw the new illustration serving you within a minute!!!! That sounds great, don't you think? If we imagine how humans spent a huge amount of time in the past to publish just 1 single piece of art. (Yeah, today, in 2024, in my personal opinion, both AI and human abilities are still not fully replaceable, especially in the terms of beautiful perfect hand :P ) However, the backbone or what happens behind the User-friendly menu allows us to "Select model", "Add LoRA", "Add ControlNet", "Set the aspect ratio (the original size of the image)" and so on, all of them are collected "Node" in a very complex "Workflow" PS.1. The Checkpoint or The Model often refers to the same thing. They are the core program that had been trained to draw the illustration. Each one has its strengths and weaknesses (I.E. Anime oriented or Realistic oriented) PS.2. The LoRA (Low-Rank Adaptation) is like an add-on to the Model allowing it to adapt to a different style, theme, and user preference. A concrete example is the Anime Character LoRA PS.3 The ControlNet is like a condition setting of the image. It helps the model to truly understand what is beyond the text prompt can describe. For instance, how a character poses in each direction and the angle of the camera. So here comes "The Comfyflow" (the nickname of the Workflow, people also mentioned it by the name "ComfyUI") which gives me a super headache when I see things like this for the first time in my life!!!!!!!!! (This image is a flow I have spent a lot of time studying, it is a flow for combining what is in the two images into a single one) Yeah, maybe, it is my fault that did not go to class about the workflow from the beginning or search for the tutorial on YouTube the first time (as my first language is not English). But would it be better if we had an instructor to tell us step-by-step here in Tensor.Art And that is the reason why I got inspired to write this article solely for the beginner. So let's start with the main content of the article. What is ComfyFlow ComfyFlow or the Workflow is an innovative AI image-generating platform that allows users to create stunning visuals with ease. To get the most out of this tool, it's important to understand two key concepts: "workflow" and "node." Let's break these down in the simplest way possible. What is a Workflow? A workflow is like a blueprint or a recipe that guides the creation of an image. Just as a recipe outlines the steps to make a dish, a workflow outlines the steps and processes needed to generate an image. It’s a sequence of actions that the AI follows to produce the final output. Think of it like this: Recipe (Workflow): Tells you what ingredients to use and in what order. Ingredients (Nodes): Each step or component used in the recipe. Despite the recommended pre-set template that TensorArt kindly gives to the users, from the beginner view's viewpoint without the knowledge of the workflow, it is not that helpful because, after clicking the "Try" button, we will bombarded with the complexity of the Node!!!!!!! What is a Node? Nodes are the building blocks of a workflow. Each node represents a specific action or process that contributes to the final image. In ComfyFlow, nodes can be thought of as individual steps in the workflow, each performing a distinct function. Imagine nodes as parts of a puzzle: Nodes: Individual pieces that fit together to complete the picture (workflow). How Do Workflows and Nodes Work Together? 1-2) Starting Point: Every workflow begins with an initial node, which might be an image input from the user, together with Checkpoint and LoRA serving the role of image references. 3-4) Processing Nodes: These are nodes that draw or modify the image in some way, such as adding color, or texture, or applying filters. 5) Ending Point: The node outputs the completed image which works very closely with the node of the previous stage in terms of sampling and VAE PS. A Variational Autoencoder (VAE) is a generative model that learns input data, such as images, to reconstruct and generate new, similar, or variations of images based on the patterns it has learned. Here is the list of nodes I have used in the normal image-generating images of my Waifu using 1checkpoint, and 2LoRAs to help the reader understand how ComfyFlow works The numbers 1-5 represent the overview process of the workflow and the role of each type of node I have mentioned above. However, in the case of more complex tasks like in AI Tools, the number of nodes sometimes is higher than 30!!!!!!! By the way, when starting with an empty ComfyFlow page, the way to add a node is "Right Click" -> "Add Node" -> Scroll down to the top, since the most frequently used node will be over there. 1) loaders -> Load CheckPoint Like in the normal task creation menu, this node is the one we can choose CheckPoint or the Core model. It is important to note that nodes work together using input/output. The "Model/CLIP/VAE" (the output) circles have to connect to the next one in which it has to correspond. We link them together by left-clicking on the circle's inner area and then drag to the destination. PS. CLIP (Contrastive Language-Image Pre-training) is a model developed by OpenAI that links images and text together in a way that helps AI understand and generate images based on textual descriptions. 2) loaders -> Load LoRA Checkpoint is very closely related to LoRA and that is a reason why they are connected by the input/output named "model/MODEL", "clip/CLIP" Anyway, since in this example, I have used 2 LoRAs (first for The theme of the picture and the Second for the character reference of my Waifu), two nodes of LoRAs then have to be connected as well. Here we can adjust the strength of the LoRA or the weight like it happens in the normal task generation menu. 3) CLIP Text Encode (Prompt) This node is the prompt and negative prompt we normally see in the menu. The input here is only clip (Contrastive Language-Image Pre-training) and the output is "CONDITIONING" User tip: If you click on the output circle of the "Load LoRA" node and drag it to the empty area, the ComfyFlow will pop up a corresponding next node list to create a new one with ease. 4) KSampler & Empty Latent Image The sampling method is used to tell the AI how it should start generating visual patterns from the initial noise and everything associated with its adjustment will be set here in this type of sampling node together with "Empty Latent Image" The inputs in this step here are models (from LoRA node), positive and negative (from prompt node) and the output is "Latent" 5) VAE Decode & Final output node Once we establish the sampling node, the output named "LATENT" will then have to connect with "samples" Meanwhile the "vae" is the linkage between this one and the "Load Checkpoint" node from the beginning. And when everything is done the "IMAGE" as a final output here will be served at your hand. PS. An AI Tool is a more complex Workflow created to do some specific task such as swapping the face of the human in the original picture with the target face or changing the style of the input illustration to another one and etc.

Tips for new Users

Intro Hey there! If you're reading this, you're probably new to AI image generation and want to learn more. If you're not, you probably already know more than me :). Yeah, full disclosure: I'm still pretty inexperienced at this whole thing, but I thought I could still share some of the things I've learned with you! So, in no particular order: 1. You can like your own posts I doubt there's anyone who doesn't know this already, but if you're posting your favorite generations and you care about getting likes, you can always like them yourself. Sketchy? Kinda. Do I still do it? Yes. And on the topic of getting more likes: 2. Likes will often be returned Whenever I receive a like on one of my posts, I'll look at that person's pictures and heart any that I particularly enjoy. I know a lot of people do this, so one of the best ways to get people to notice and like your content is to just browse through posts and be generous with your own likes. It's a great way to get inspiration too! 3. Use turbo/lightning LORAs If you find yourself running out of credits, there are ways to conserve them. When I'm iterating on an idea, I'll use a SDXL model (Meina XL) paired with this LORA. This lets me get high quality images in 10 steps for only 0.4 credits! It's really nice, and works with any SDXL model. Unfortunately, if there is a similar method for speeding up SD 1.5 models I don't know it, so it only works with XL. 4. Use ADetailer smartly ADetailer is the best solution I've found for improving faces and hands. It's also a little difficult to figure out. So, though I'm still not a professional with it, I thought I could share some of the tricks I've learned. The models I normally use are face_yolo8s.pt and hand_yolo8s.pt. The "8s" versions are better than the "8n" versions, though they are slightly slower. In addition to these models, I'll often add the Attractive Eyes and Perfect Hand LORAs respectively. These are all just little things you can do to improve these notoriously hard parts of image generation. Also, using ADetailer before upscaling the image is cheaper in terms of credits, though the upscaling process can sometimes mess up the hands and face a little bit so there's some give and take there. 5. Use an image editing app Wait a minute, I hear you saying, isn't this a guide for using Tensor Art? Yes, but you can still use other tools to improve your images. If I don't like a specific part of my image, I'll download it, open it in Krita (Or Photoshop or Gimp) and work on it. My art skills are pretty bad, (which is why I'm using this site in the first place,) but I can still remove, recolor, or edit certain aspects of the image. I can then reupload it to Tensor Art, and Img2img with a high denoising strength to improve it further. You could also just try inpainting the specific thing you want to change, but I always find it a bit of a struggle to get inpaint to make the changes I want. 6. Experiment! The best way to learn is to do, so just start generating images, fiddling with settings, and trying new things. I still feel like I'm learning new stuff every day, and this technology is improving so fast that I don't think anyone will ever truly master it. But we can still try our hardest and hone our skills through experimentation, sharing knowledge, and getting more familiar with these models. And all the anime girls are a big plus too. Outro If you have anything to add, or even a tip you'd like to share, definitely leave a comment and maybe I can add it to this article. This list is obviously not exhaustive, and I'm no where near as talented as some of the people on this platform. Still though, I hope to have helped at least one person today. If that was you, maybe give the article a like? I appreciate it a ton, so if you enjoyed, just let me know. Thanks for reading!
• MOOD MAGIC SERIES • I. Melancholy

• MOOD MAGIC SERIES • I. Melancholy

MOOD MAGIC: adding emotion to your prompts Melancholy & Gloom Overcast: Cloud-covered skies for subdued lighting. Dim Lighting: Limited light sources for creating deep shadows. Muted Colors: Toned-down color palette to convey sadness or desolation. Dusky: Twilight ambiance, suggesting the fading light of day. Foggy: A thick mist that obscures details and softens the scene. Drizzly: Gentle rain that adds a reflective, melancholic quality. Cloudy: Thick clouds that reduce brightness and saturate the scene with grey. Desaturated: Low color saturation to enhance the bleak feel. Shadowed: Prominent shadows that deepen the mood. Moody Lighting: Emotionally charged lighting with strong contrasts. Gloomy: Overall dark and dismal atmosphere. Monochrome: Black and white or single-color dominance to strip away cheer. Underexposed: Darker exposure to mimic a sense of foreboding. Chiaroscuro: Strong contrasts between light and dark, emphasizing turmoil. Hazy: Blurred or smoky atmosphere, creating a sense of mystery or unease. Twilight: Dim natural lighting that can feel lonely or isolating. Stormy: Implication of an approaching or ongoing storm to add tension. Wintery: Cold, barren landscape cues, even in urban settings. Grainy: Visual noise that adds an old or troubled quality. Bleak: Stark, harsh lighting or barren scenery settings. Ominous Clouds: Dark, menacing clouds that threaten bad weather. Subdued Tones: Soft, low-key colors that don't catch the eye. Cold Colors: Blues and greys to suggest chilliness and discomfort. Rusty: Implications of decay and neglect. Aged: A sense of time wearing down the scene, historical weariness. Soft Focus: Slightly out-of-focus elements to create a sense of disorientation or confusion. Tenebrous: Deeply shadowed, almost pitch-dark. Low-Key Lighting: Minimal lighting mostly in darkness with occasional highlights. Pensive: Engaged in, involving, or reflecting deep or serious thought. Yearning: A feeling of intense longing for something typically something that one has lost or been separated from. Weary: Conveying a sense of tiredness or exhaustion, both physical and emotional. Sparse: Minimalist or bare settings that suggest simplicity or emptiness. Brooding: A deep, serious, and sometimes dark contemplation. Silent: Lack of sound or motion, emphasizing solitude or contemplation. Ephemeral: Fleeting or transitory, suggesting the transient nature of moments and emotions. Desolate: Emptiness that conveys a sense of abandonment or loneliness. Poetic: Imbued with a sense of beauty and melancholy, often through lyrical expression. Moody Skies: Cloudy, stormy, or unsettled skies that reflect a turbulent emotional landscape. Cold Light: Harsh, unyielding light that doesn’t warm but isolates subjects. Autumnal: Related to autumn, often seen as a melancholic season due to its association with the end of summer. Faded: Colors or elements that have lost brightness, suggesting the passing of time. Blue Hour: Moody cool natural lighting obtained in the twilight hour just after sunset or just before sunrise. Example using Stable Diffusion SDXL + refiner Checkpoint: RealVis4 Cfg: 5.5 Steps: 40 Sampler: DPM++ 3m SDE Karras Visualize a close-up portrait of a young woman standing by a foggy window, her gaze distant and contemplative. The room is dimly lit, with only a soft, diffuse light filtering through the heavy overcast outside, casting subtle shadows across her face. The colors are desaturated, emphasizing a palette of cool grays and muted blues that reflect her somber mood. Her expression is serene yet melancholic, with her eyes slightly downcast as if lost in thought. The background is blurred, enhancing the sense of isolation and introspection. This portrait captures the essence of melancholy, framed in a moment of quiet solitude. negative: illustration, cartoon, anime, 3d, digital art, bad quality, CGI, sketch, drawn, blurry, painting, worst quality, low quality, bad anatomy, bad hands, bad body, missing fingers, extra digit, fewer digits
Buzz words: LIGHTING

Buzz words: LIGHTING

Getting the lighting right is key to making your AI-generated images look super realistic. This guide gives you the top keywords to use in your prompts to nail the lighting every time. Whether you're after dramatic shadows or soft, natural light, these tips will help your images look lifelike and set the tone to your composition. Ambient light: Soft, even lighting that fills the entire scene, reducing shadows. Chiaroscuro Lighting: A technique that uses strong contrasts between light and dark to create a dramatic, three-dimensional effect. Rim light: Light that outlines the subject, emphasizing its edges and creating a glowing effect. Diffused light: Soft light scattered in many directions, minimizing harsh shadows. Natural light: Light from the sun, moon, or other natural sources, offering realism and variation Backlight: Light coming from behind the subject, creating a silhouette or halo effect. Volumetric light: Light that interacts with particles in the air, such as fog or dust, creating visible light rays and enhancing the sense of depth in the scene. Polarized light: Light that vibrates in parallel planes. Emissive light: Light emitted from surfaces or objects themselves, often used to simulate glowing materials or lights. Directional light: Focused light from a specific direction, creating strong shadows and highlights. Soft light: Gentle light that produces minimal shadows, creating a smoother look. Hard light: Sharp, intense light that casts strong shadows and highlights details. Spotlight: Intense focused beam that highlights a set area or subject. Artificial light: Light from man-made sources allowing precise control over the scene. Holagen, florescent, blacklight, led, xenon, plasma, ultraviolet, incandescent, neon, Infrared, sodium vapor lights, metal halide lights, krypton, photoluminescent, ceramic metal halide, HMI, CCFL, CFL Low key light: Predominantly dark lighting with high contrast, often creating a dramatic or moody atmosphere. High Key Light: Bright, low-contrast lighting that minimizes shadows. Bounce Lighting/Reflected Lighting: Light reflected off a surface to soften the effect and spread it more evenly. Side Lighting: Light coming from the side of the subject. Caustic Lighting: Light patterns created when light is refracted or reflected through transparent or reflective materials, producing intricate and often beautiful effects. Uplighting: Light directed upwards. Great for emphasizing architectural features. Color Gel Lighting: The use of colored filters over lights to alter the color or mood of the scene. Gobo Lighting: Using a stencil or template placed in front of a light source to project patterns or shapes onto a surface. Split Lighting: Lighting that illuminates one half of the subject's face while leaving the other half in shadow, creating a strong, dramatic effect Butterfly Lighting: Light placed above and in front of the subject, creating a butterfly-shaped shadow under the nose, often used in glamour photography. Rembrandt Lighting: technique where light creates a triangle of illumination on the cheek opposite the light source, adding depth and character. Specular lighting: Sharp, bright reflections from shiny surfaces, emphasizing glossiness and texture. Natural Breakup Lighting/Dappled Lighting: Using irregular patterns to mimic natural light effects, such as light filtering through leaves. Subsurface Scattering: Light that penetrates the surface of a translucent material, scattering within and then exiting at a different point, adding realism to materials like skin or wax. Golden Hour: Warm golden natural lighting obtained shortly after sunrise or shortly before sunset. Creates long soft shadows. Blue Hour: Moody cool natural lighting obtained in the twilight hour just after sunset or just before sunrise. Clamshell Lighting: portrait lighting setup using two light sources, one above and one below the subject's face. Catch light: A small reflection of the light source in the subject's eyes, adding life and dimension to portraits. Cross lighting: two light sources positioned at opposite sides of the subject, creating dramatic shadows and highlights. Tenebrism: Aggressive contrast between light and dark producing dark and gloomy images. Contre-jour: Lighting technique that produces clear silhouettes by the use of backlighting. Sfumato: Artistic lighting technique soft transitions between colors and tones resulting in a dreamy effect with no clear boundaries. Ie. The Mona Lisa. Ray tracing: Rendering technique that simulates the way the light interacts with the scene. Traces the light from the source, bounces off surfaces and reaches the viewers eye. Three point lighting: Cinematic lighting technique using key light, fill light and backlight. Global Illumination: Computer graphic technique that adds more realistic lighting to 3d scenery. Bloom: simulates the glow around bright light sources, creating a soft halo. Luminescence: emission of light by a substance not resulting from heat. It occurs through various processes such as chemical reactions, electrical energy, or other means. Bioluminescence: A cold light produced out of a chemical reaction inside of a living organism.
Quickstart Guide to Stable Video Diffusion

Quickstart Guide to Stable Video Diffusion

What is Stable Video Diffusion (SVD)? Stable Video Diffusion (SVD) from Stability AI, is an extremely powerful image-to-video model, which accepts an image input, into which it “injects” motion, producing some fantastic scenes. SVD is a latent diffusion model trained to generate short video clips from image inputs. There are two models. The first, img2vid, was trained to generate 14 frames of motion at a resolution of 576×1024, and the second, img2vid-xt is a finetune of the first, trained to generate 25 frames of motion at the same resolution. The newly released (2/2024) SVD 1.1 is further finetuned on a set of parameters to produce excellent, high-quality outputs, but requires specific settings, detailed below. Why should I be excited by SVD? SVD creates beautifully consistent video movement from our static images! How can I use SVD? ComfyUI is leading the pack when it comes to SVD image generation, with official SVD support! 25 frames of 1024×576 video uses < 10 GB VRAM to generate. It’s entirely possible to run the img2vid and img2vid-xt models on a GTX 1080 with 8GB of VRAM! There’s still no word (as of 11/28) on official SVD support in Automatic1111. If you’d like to try SVD on Google Colab, this workbook works on the Free Tier; https://github.com/sagiodev/stable-video-diffusion-img2vid/. Generation time varies, but is generally around 2 minutes on a V100 GPU. You’ll need to download one of the SVD models, from the links below, placing them in the ComfyUI/models/checkpoints directory After updating your ComfyUI installation, you’ll see new nodes for VideoLinearCFGGuidance and SVD_img2vid _Conditioning. The Conditioning node takes the following inputs; You can download ComfyUI workflows for img2video and txt2video below, but keep in mind you’ll need to have an updated ComfyUI, and also may be missing additional nodes for Video. I recommend using the ComfyUI Manager to identify and download missing nodes! Suggested Settings The settings below are suggested settings for each SVD component (node), which I’ve found produce the most consistently useable outputs, with the img2vid and img2vid-xt models. Settings – Img2vid-xt-1.1 February 2024 saw the release of a finetuned SVD model, version 1.1. This version only works with a very specific set of parameters to improve the consistency of outputs. If using the Img2vid-xt-1.1 model, the following settings must be applied to produce the best results; The easiest way to generate videos in tensor.art, you can generate videos very easily compared to the explanation above, all you need to do is input the prompt you want, select the model you like, set the ratio and set the frame in the animatediff menu. Output Examples Limitations It’s not perfect! Currently there are a few issues with the implementation, including; Generations are short! Only <=4 second generations are possible, at present. Sometimes there’s no motion in the outputs. We can tweak the conditioning parameters, but sometimes the images just refuse to move. The models cannot be controlled through text. Faces, and bodies in general, often aren’t the best!
List of style collection - focusing on anime charactor examples (continue updating)

List of style collection - focusing on anime charactor examples (continue updating)

AI image-generating platforms like Tensor.art offer diverse anime styles, enabling users to create artwork in various distinct masterpieces of art inspired by popular anime aesthetics. These collections aim to cater to different preferences from classic to contemporary anime illustrations within one place. P.S.1 I will continue updating this post maybe every 2 weeks when I find a unique style (both for LoRA and model) that is worth listing here solely from my perspective - Anyway if anyone has a list of favorite styles in mind, feel free to share them here or even create your post. :D P.S.2 People normally mix multiple LoRA at once, and the core model (checkpoint) has a variation in base style depending on the prompt used. Therefore, in the following example, I will choose only a single LoRA or Checkpoint to represent without mixing anything. However, if confusion about the contribution to the style happens, I have to apologize in advance since I am just a beginner in the art community. Here are some examples: Anime Lineart / Manga-like (线稿/線画/マンガ風/漫画风) Style (LORA) https://tensor.art/models/623935989624337542 Spacezin Sketch Style (LoRA) https://tensor.art/models/638083414328801488 Cute Chibi - V.1 (LoRA) https://tensor.art/models/726716640076597245 CAT - Citron Anime Treasure (Checkpoint) https://tensor.art/models/713607777118974323 LizMix V.7.0 (Checkpoint) https://tensor.art/models/721034681811855891 Flower style - (LORA) https://tensor.art/models/699582840586758007 Art Nouveau Style - Oosayam (LoRA) https://tensor.art/models/654562112921690173 Torino Style - v.2.0.09 (LoRA) https://tensor.art/models/705577639974520212 Yody PVC 3D Print - 1.0 (Checkpoint) https://tensor.art/models/673632484975460872 Eldritch Expressionism style (LoRA) https://tensor.art/models/708171473803739178 [Y5] Impressionism Style 印象派风格 (LoRA) https://tensor.art/models/621173217551417505 surrealism - 2024-02-17 (LoRA) https://tensor.art/models/695557949424221333 pop-art - 01 style (LoRA) https://tensor.art/models/697182692602582375 FF Style: Kazimir Malevich | Suprematism (LoRA) https://tensor.art/models/655758742350092928 Hoping these collections (today and in the future) will allow A.I. artists and enthusiasts to generate anime-inspired images effortlessly, blending creativity with advanced AI technology to bring their visions to life. :D
Prompt reference for "Lighting Effects"

Prompt reference for "Lighting Effects"

Hello. I usually use "lighting/lighting effects" when generating images. I will introduce some of the "words" I use when I want to add something. Please note that these words alone do not provide 100% effectiveness, and the base model The effect you get will differ depending on the LoRA sampling method and where you place it in the prompt. Words related to "lighting effects" ・ Backlight :  Light from behind the subject ・ Colorful lighting :  The impression itself is not colored, but the color changes depending on the light. ・ moody lighting :  natural lighting, not direct artificial light ・ studio lighting :  A term used to describe the artificial lighting of a photography studio. ・ Directional Light :  directional light source is a light source that shines parallel rays in a selected direction. ・ Dramatic lighting :  Lighting techniques in the field of photography ・ Spot lighting :  A lighting technique that uses artificial light in a small area. ・ Cinematic lighting :  A single word that describes several lighting techniques used in movies. ・ Bounce Lighting :  Light reflected by a reflex plate, etc. ・ Practical Lighting :  Photographs and videos that depict the light source itself in the composition ・ Volumetric lighting :  A word derived from 3DCG. It tends to be a picture with a divine golden light source. ・ Dynamic lighting :  I don't really understand what it means, but it tends to create high-contrast images. ・ Warm lighting :  Creates a warm picture illuminated with warm colors ・ Cold lighting :  Lights with a cold light source. ・ High-key lighting :  Soft light, minimal shadows, low contrast, resulting in bright frames ・ Low-key lighting :  It provides high contrast, but the impression is a little weak. ・ Hard light :  Strong light. Highlights appear strong. ・ soft light :  A word that refers to faint light. ・ strobe lighting :  strong artificial light (stroboscopic lighting) ・ Ambient light :  An English word that refers to ambient lighting/indoor lighting. ・ flash lighting  :  For some reason, the characters themselves tend to emit light, and there are often flashes of light.  (flash lighting photography)  ・ Natural lighting :  This tends to create a natural-looking picture that feels contrasting with artificial light.
The future of AI image generation: endless possibilities -

The future of AI image generation: endless possibilities -

introduction {{For those who are about to start AI image generation}} In recent years, advances in AI technology have brought about revolutionary changes in the field of image generation. In particular, AI-powered illustration generation has become a powerful tool for artists and designers. However, as this technology advances, issues of creativity and copyright arise. In this article, we will explain the possibilities of AI image generation, specific use cases, how to create prompts, how to use LoRA and its effects, keywords for improving image quality, consideration for copyright, etc. Fundamentals of AI image generation AI image generation uses artificial intelligence to learn from data and generate new images. Deep learning techniques are often used for this, and one notable approach is stable diffusion. Stable Diffusion employs a probabilistic method called a diffusion model to gradually remove noise during image generation, resulting in highly realistic, high-quality output. Generating real images AI technology is excellent not only for creating cute illustrations, but also for generating realistic images. For example, you can generate high-resolution images that resemble photorealistic landscapes or portraits. By utilizing Stable Diffusion, it is possible to generate more detailed images, which expands the possibilities of application in various fields such as advertising, film production, and game design. Generate cute illustrations One of the practical applications of AI image generation is the creation of cute illustrations. This is useful for things like character design and avatar creation, allowing you to quickly generate different styles. This process typically involves collecting a large dataset of illustrations, training an AI model on this data to learn different styles and patterns, and generating new illustrations based on user input or keywords. creativity and AI AI image generation also influences creative ideas. Artists can use her AI-generated images as inspiration for new works or expand on ideas, which can lead to the creation of new styles and concepts never thought of before. Use and effects of LoRA LoRA (Low-Rank Adaptation) is a technique used to improve the performance of AI models. Its impacts include: 1. Fine-tune models: LoRA allows you to fine-tune existing AI models to learn specific styles and features, allowing for customization based on user needs. 2. Efficient learning: LoRA reduces the need for large-scale data collection and training costs by efficiently training models using small datasets. 3. Rapid adaptation: LoRA allows you to quickly adapt to new styles and trends, making it easy to generate images tailored to your current needs. For example, LoRA can be leveraged to efficiently achieve high-quality results when generating illustrations in a specific style. Creating a prompt When instructing an AI to generate illustrations, it's important to create effective prompts. Key points for creating prompts include providing specific instructions, using the right keywords, trial and error, and an optional reference image to help the AI figure out what you're looking for. Keywords for improving image quality When creating prompts for AI image generation, you can incorporate keywords related to image quality improvement to improve the overall quality of the images generated. Useful keywords include "high resolution," "detail," "clean lines," "high quality," "sharp," "bright colors," and "photorealistic." Copyright considerations Image generation using AI also raises copyright issues. If the dataset used to train your AI model contains copyrighted works, the resulting images may infringe your copyright. When using AI image generation tools, it's important to be aware of the data source, ensure that the generated images comply with copyright laws, and check the license agreement. conclusion AI image generation offers great possibilities for artists and designers, but it also raises challenges related to copyright. By using data responsibly and understanding copyright law, you can leverage AI technology to create innovative work. Leveraging technologies like LoRA can further improve efficiency and quality. Users can adjust the output by incorporating image enhancement keywords into the prompt. Let's explore new ways of expression while being aware of advances in AI technology and the considerations that come with it! !
19
16
Stylistic QR Code with Stable Diffusion

Stylistic QR Code with Stable Diffusion

source: anfu.me (now you can easyly create QRcode with tensor.art inside controlnet, next time i will create guide about that) Yesterday, I created this image using Stable Diffusion and ControlNet, and shared on Twitter and Instagram – an illustration that also functions as a scannable QR code. The process of creating it was super fun, and I’m quite satisfied with the outcome. In this post, I would like to share some insights into my learning journey and the approaches I adopted to create this image. Additionally, I want to take this opportunity to credit the remarkable tools and models that made this project possible. Get into the Stable Diffusion This year has witnessed an explosion of mind-boggling AI technologies, such as ChatGPT, DALL-E, Midjourney, Stable Diffusion, and many more. As a former photographer also with some interest in design and art, being able to generate images directly from imagination in minutes is undeniably tempting. So I started by trying Midjourney, it’s super easy to use, very expressive, and the quality is actually pretty good. It would honestly be my recommendation for anyone who wants to get started with generative AI art. By the way, Inès has also delved into it and become quite good at it now, go check her work on her new Instagram account  @a.i.nes. On my end, being a programmer with strong preferences, I would naturally seek for greater control over the process. This brought me to the realm of Stable Diffusion. I started with this guide: Stable Diffusion LoRA Models: A Complete Guide. The benefit of being late to the party is that there are already a lot of tools and guides ready to use. Setting up the environment quite straightforward and luckily my M1 Max’s GPU is supported. QR Code Image A few weeks ago, nhciao on reddit posted a series of artistic QR codes created using Stable Diffusion and ControlNet. The concept behind them fascinated me, and I defintely want to make one for my own. So I did some research and managed to find the original article in Chinese: Use AI to Generate Scannable Images. The author provided insights into their motivations and the process of training the model, although they did not release the model itself. On the other hand, they are building a service called QRBTF.AI to generate such QR code, however it is not yet available. Until another day I found an community model QR Pattern Controlnet Model on CivitAI. I know I got to give it a try! Setup My goal was to generate a QR code image that directs to my website while elements that reflect my interests. I ended up taking a slightly cypherpunk style with a character representing myself :P Disclaimer: I’m certainly far from being an expert in AI or related fields. In this post, I’m simply sharing what I’ve learned and the process I followed. My understanding may not be entirely accurate, and there are likely optimizations that could simplify the process. If you have any suggestions or comments, please feel free to reach out using the links at the bottom of the page. Thank you! 1. Setup Environment I pretty much follows Stable Diffusion LoRA Models: A Complete Guide to install the web ui AUTOMATIC1111/stable-diffusion-webui, download models you are interested in from CivitAI, etc. As a side note, I found that the user experience of the web ui is not super friendly, some of them I guess are a bit architectural issues that might not be easy to improve, but luckily I found a pretty nice theme canisminor1990/sd-webui-kitchen-theme that improves a bunch of small things. In order to use ControlNet, you will also need to install the Mikubill/sd-webui-controlnet extension for the web ui. Then you can download the QR Pattern Controlnet Model, putt the two files (.safetensors and .yaml) under stable-diffusion-webui/models/ControlNet folder, and restart the web ui. 2. Create a QR Code There are hundreds of QR Code generators full of adds or paid services, and we certainly don’t need those fanciness – because we are going to make it much more fancier 😝! So I end up found the QR Code Generator Library, a playground of an open source QR Code generator. It’s simple but exactly what I need! It’s better to use medium error correction level or above to make it more easy recognizable later. Small tip that you can try with different Mask pattern to find a better color destribution that fits your design. 3. Text to Image As the regular Text2Image workflow, we need to provide some prompts for the AI to generate the image from. Here is the prompts I used: Prompts (one male engineer), medium curly hair, from side, (mechanics), circuit board, steampunk, machine, studio, table, science fiction, high contrast, high key, cinematic light, (masterpiece, top quality, best quality, official art, beautiful and aesthetic:1.3), extreme detailed, highest detailed, (ultra-detailed) Negative Prompts (worst quality, low quality:2), overexposure, watermark, text, easynegative, ugly, (blurry:2), bad_prompt,bad-artist, bad hand, ng_deepnegative_v1_75t Then we need to go the ControlNet section, and upload the QR code image we generated earlier. And configure the parameters as suggested in the model homepage. Then you can start to generate a few images and see if it met your expectations. You will also need to check if the generated image is scannable, if not, you can tweak the Start controling step and End controling step to find a good balance between stylization and QRCode-likeness. 4. I’m feeling lucky! After finding a set of parameters that I am happy with, I will increase the Batch Count to around 100 and let the model generate variations randomly. Later I can go through them and pick one with the best conposition and details for further refinement. This can take a lot of time, and also a lot of resources from your processors. So I usually start it before going to bed and leave it overnight. Here are some examples of the generated variations (not all of them are scannable): From approximately one hundred variations, I ultimately chose the following image as the starting point: It gets pretty interesting composition, while being less obvious as a QR code. So I decided to proceed with it and add add a bit more details. (You can compare it with the final result to see the changes I made.) 5. Refining Details Update: I recently built a toolkit to help with this process, check my new blog post 👉 Refine AI Generated QR Code for more details. The generated images from the model are not perfect in every detail. For instance, you may have noticed that the hand and face appear slightly distorted, and the three anchor boxes in the corner are less visually appealing. We can use the inpaint feature to tell the model to redraw some parts of the image (it would better if you keep the same or similiar prompts as the original generation). Inpainting typically requires a similar amount of time as generating a text-to-image, and it involves either luck or patience. Often, I utilize Photoshop to "borrow" some parts from previously generated images and utilize the spot healing brush tool to clean up glitches and artifacts. My Photoshop layers would looks like this: After making these adjustments, I’ll send the combined image back for inpainting again to ensure a more seamless blend. Or to search for some other components that I didn’t found in other images. Specifically on the QR Code, in some cases ControlNet may not have enough prioritize, causing the prompts to take over and result in certain parts of the QR Code not matching. To address this, I would overlay the original QR Code image onto the generated image (as shown in the left image below), identify any mismatches, and use a brush tool to paint those parts with the correct colors (as shown in the right image below). I then export the marked image for inpainting once again, adjusting the Denoising strength to approximately 0.7. This would ensures that the model overrides our marks while still respecting the color to some degree. Ultimately, I iterate through this process multiple times until I am satisfied with every detail. 6. Upscaling The recommended generation size is 920x920 pixels. However, the model does not always generate highly detailed results at the pixel level. As a result, details like the face and hands can appear blurry when they are too small. To overcome this, we can upscale the image, providing the model with more pixels to work with. The SD Upscaler script in the img2img tab is particularly effective for this purpose. You can refer to the guide Upscale Images With Stable Diffusion for more information. 7. Post-processing Lastly, I use Photoshop and Lightroom for subtle color grading and post-processing, and we are done! The one I end up with not very good error tolerance, you might need to try a few times or use a more forgiving scanner to get it scanned :P And using the similarly process, I made another one for Inès: Conclusion Creating this image took me a full day, with a total of 10 hours of learning, generating, and refining. The process was incredibly enjoyable for me, and I am thrilled with the end result! I hope this post can offer you some fundamental concepts or inspire you to embark on your own creative journey. There is undoubtedly much more to explore in this field, and I eager to see what’s coming next! Join my Discord Server and let’s explore more together! If you want to learn more about the refining process, go check my new blog post: Refining AI Generated QR Code. References Here are the list of resources for easier reference. Concepts Stable Diffusion ControlNet Tools Hardwares & Softwares I am using. AUTOMATIC1111/stable-diffusion-webui - Web UI for Stable Diffusion canisminor1990/sd-webui-kitchen-theme - Nice UI enhancement Mikubill/sd-webui-controlnet - ControlNet extension for the webui QR Code Generator Library - QR code generator that is ad-free and customisable Adobe Photoshop - The tool I used to blend the QR code and the illustration Models Control Net Models for QR Code (you can pick one of them) QR Pattern Controlnet Model Controlnet QR Code Monster IoC Lab Control Net Checkpoint Model (you can use any checkpoints you like) Ghostmix Checkpoint - A very high quality checkpoint I use. You can use any other checkpoints you like Tutorials Stable Diffusion LoRA Models: A Complete Guide - The one I used to get started (Chinese) Use AI to genereate scannable images - Unfortunately the article is in Chinese and I didn’t find a English version of it. Upscale Images With Stable Diffusion - Enlarge the image while adding more details
The Marvel of Tanjore Temple: A Timeless Treasure

The Marvel of Tanjore Temple: A Timeless Treasure

Introduction The Tanjore Temple, also known as Brihadeeswarar Temple, is a striking example of India’s architectural grandeur and rich cultural heritage. Nestled in the historic town of Thanjavur in Tamil Nadu, this UNESCO World Heritage Site draws thousands of visitors each year, eager to marvel at its towering vimana (temple tower), intricate carvings, and vibrant history. Historical Background Built by the great Chola emperor Raja Raja Chola I in the 11th century, the Tanjore Temple stands as a testament to the ingenuity and vision of ancient Indian architects and artisans. Completed in 1010 AD, it celebrated its millennium in 2010, marking a thousand years of awe-inspiring presence. Architectural Splendor The Vimana The most striking feature of the Tanjore Temple is its colossal vimana, which rises to a height of 66 meters. This towering structure is crowned with a massive dome, made from a single piece of granite weighing approximately 80 tons. This engineering marvel leaves historians and architects alike in awe, given the lack of modern machinery during its construction. The Sanctum At the heart of the temple lies the sanctum sanctorum, housing a massive Shiva lingam. The inner walls of the sanctum are adorned with exquisite frescoes and murals, depicting various mythological scenes and showcasing the artistic brilliance of the Chola period. Intricate Carvings Every inch of the Tanjore Temple is a canvas of intricate carvings. From the elaborate depictions of deities and mythological narratives on the walls to the ornate pillars and ceilings, the temple is a visual feast. These carvings not only serve as decorative elements but also provide a glimpse into the socio-cultural milieu of the Chola dynasty. Cultural Significance Religious Importance The Tanjore Temple is dedicated to Lord Shiva and holds immense religious significance for Hindus. It is one of the largest temples in India and serves as a major pilgrimage site, especially during festivals like Maha Shivaratri. Devotees from across the country flock to the temple to seek blessings and participate in the vibrant festivities. Artistic Heritage The temple is a treasure trove of Chola art and architecture. The frescoes and murals, in particular, offer invaluable insights into the artistic and cultural landscape of the period. The depictions of dance forms, musical instruments, and attire provide a vivid picture of the era’s cultural richness. Visiting Tanjore Temple Best Time to Visit The ideal time to visit Tanjore Temple is between October and March when the weather is pleasant. The temple complex is open from early morning till evening, allowing visitors ample time to explore and soak in its magnificence. How to Reach Thanjavur is well-connected by road, rail, and air. The nearest airport is Tiruchirappalli International Airport, about 60 kilometers away. Thanjavur Junction is the nearest railway station, with regular trains from major cities like Chennai, Bangalore, and Coimbatore. Buses and taxis are also readily available for local transportation. Accommodation Thanjavur offers a range of accommodation options, from budget hotels to luxury resorts, catering to the diverse needs of travelers. Staying in the town allows visitors to explore not just the temple, but also other nearby attractions like the Thanjavur Royal Palace and the Saraswathi Mahal Library. Conclusion The Tanjore Temple is more than just an architectural marvel; it is a living testament to India’s rich cultural and religious heritage. Its towering vimana, intricate carvings, and historical significance make it a must-visit destination for history enthusiasts, art lovers, and spiritual seekers alike. Plan your visit to this timeless treasure and immerse yourself in the grandeur of the Chola dynasty.
[Guide] Make your own Loras, easy and free

[Guide] Make your own Loras, easy and free

This article helped me to create my first Lora and upload it to Tensor.art, although Tensor.art has its own Lora Train , this article helps to understand how to create Lora well. 🏭 Preamble Even if you don't know where to start or don't have a powerful computer, I can guide you to making your first Lora and more! In this guide we'll be using resources from my GitHub page. If you're new to Stable Diffusion I also have a full guide to generate your own images and learn useful tools. I'm making this guide for the joy it brings me to share my hobbies and the work I put into them. I believe all information should be free for everyone, including image generation software. However I do not support you if you want to use AI to trick people, scam people, or break the law. I just do it for fun. Also here's a page where I collect Hololive loras. 📃What you need An internet connection. You can even do this from your phone if you want to (as long as you can prevent the tab from closing). Knowledge about what Loras are and how to use them. Patience. I'll try to explain these new concepts in an easy way. Just try to read carefully, use critical thinking, and don't give up if you encounter errors. 🎴Making a Lora t has a reputation for being difficult. So many options and nobody explains what any of them do. Well, I've streamlined the process such that anyone can make their own Lora starting from nothing in under an hour. All while keeping some advanced settings you can use later on. You could of course train a Lora in your own computer, granted that you have an Nvidia graphics card with 6 GB of VRAM or more. We won't be doing that in this guide though, we'll be using Google Colab, which lets you borrow Google's powerful computers and graphics cards for free for a few hours a day (some say it's 20 hours a week). You can also pay $10 to get up to 50 extra hours, but you don't have to. We'll also be using a little bit of Google Drive storage. This guide focuses on anime, but it also works for photorealism. However I won't help you if you want to copy real people's faces without their consent. 🎡 Types of Lora As you may know, a Lora can be trained and used for: A character or person An artstyle A pose A piece of clothing etc However there are also different types of Lora now: LoRA: The classic, works well for most cases. LoCon: Has more layers which learn more aspects of the training data. Very good for artstyles. LoHa, LoKR, (IA)^3: These use novel mathematical algorithms to process the training data. I won't cover them as I don't think they're very useful. 📊 First Half: Making a Dataset This is the longest and most important part of making a Lora. A dataset is (for us) a collection of images and their descriptions, where each pair has the same filename (eg. "1.png" and "1.txt"), and they all have something in common which you want the AI to learn. The quality of your dataset is essential: You want your images to have at least 2 examples of: poses, angles, backgrounds, clothes, etc. If all your images are face close-ups for example, your Lora will have a hard time generating full body shots (but it's still possible!), unless you add a couple examples of those. As you add more variety, the concept will be better understood, allowing the AI to create new things that weren't in the training data. For example a character may then be generated in new poses and in different clothes. You can train a mediocre Lora with a bare minimum of 5 images, but I recommend 20 or more, and up to 1000. As for the descriptions, for general images you want short and detailed sentences such as "full body photograph of a woman with blonde hair sitting on a chair". For anime you'll need to use booru tags (1girl, blonde hair, full body, on chair, etc.). Let me describe how tags work in your dataset: You need to be detailed, as the Lora will reference what's going on by using the base model you use for training. If there is something in all your images that you don't include in your tags, it will become part of your Lora. This is because the Lora absorbs details that can't be described easily with words, such as faces and accessories. Thanks to this you can let those details be absorbed into an activation tag, which is a unique word or phrase that goes at the start of every text file, and which makes your Lora easy to prompt. You may gather your images online, and describe them manually. But fortunately, you can do most of this process automatically using my new 📊 dataset maker colab. Here are the steps: 1️⃣ Setup: This will connect to your Google Drive. Choose a simple name for your project, and a folder structure you like, then run the cell by clicking the floating play button to the left side. It will ask for permission, accept to continue the guide. If you already have images to train with, upload them to your Google Drive's "lora_training/datasets/project_name" (old) or "Loras/project_name/dataset" (new) folder, and you may choose to skip step 2. 2️⃣ Scrape images from Gelbooru: In the case of anime, we will use the vast collection of available art to train our Lora. Gelbooru sorts images through thousands of booru tags describing everything about an image, which is also how we'll tag our images later. Follow the instructions on the colab for this step; basically, you want to request images that contain specific tags that represent your concept, character or style. When you run this cell it will show you the results and ask if you want to continue. Once you're satisfied, type yes and wait a minute for your images to download. 3️⃣ Curate your images: There are a lot of duplicate images on Gelbooru, so we'll be using the FiftyOne AI to detect them and mark them for deletion. This will take a couple minutes once you run this cell. They won't be deleted yet though: eventually an interactive area will appear below the cell, displaying all your images in a grid. Here you can select the ones you don't like and mark them for deletion too. Follow the instructions in the colab. It is beneficial to delete low quality or unrelated images that slipped their way in. When you're finished, send Enter in the text box above the interactive area to apply your changes. 4️⃣ Tag your images: We'll be using the WD 1.4 tagger AI to assign anime tags that describe your images, or the BLIP AI to create captions for photorealistic/other images. This takes a few minutes. I've found good results with a tagging threshold of 0.35 to 0.5. After running this cell it'll show you the most common tags in your dataset which will be useful for the next step. 5️⃣ Curate your tags: This step for anime tags is optional, but very useful. Here you can assign the activation tag (also called trigger word) for your Lora. If you're training a style, you probably don't want any activation tag so that the Lora is always in effect. If you're training a character, I myself tend to delete (prune) common tags that are intrinsic to the character, such as body features and hair/eye color. This causes them to get absorbed by the activation tag. Pruning makes prompting with your Lora easier, but also less flexible. Some people like to prune all clothing to have a single tag that defines a character outfit; I do not recommend this, as too much pruning will affect some details. A more flexible approach is to merge tags, for example if we have some redundant tags like "striped shirt, vertical stripes, vertical-striped shirt" we can replace all of them with just "striped shirt". You can run this step as many times as you want. 6️⃣ Ready: Your dataset is stored in your Google Drive. You can do anything you want with it, but we'll be going straight to the second half of this tutorial to start training your Lora! ⭐ Second Half: Settings and Training This is the tricky part. To train your Lora we'll use my ⭐ Lora trainer colab. It consists of a single cell with all the settings you need. Many of these settings don't need to be changed. However, this guide and the colab will explain what each of them do, such that you can play with them in the future. Here are the settings: ▶️ Setup: Enter the same project name you used in the first half of the guide and it'll work automatically. Here you can also change the base model for training. There are 2 recommended default ones, but alternatively you can copy a direct download link to a custom model of your choice. Make sure to pick the same folder structure you used in the dataset maker. ▶️ Processing: Here are the settings that change how your dataset will be processed. The resolution should stay at 512 this time, which is normal for Stable Diffusion. Increasing it makes training much slower, but it does help with finer details. flip_aug is a trick to learn more evenly, as if you had more images, but makes the AI confuse left and right, so it's your choice. shuffle_tags should always stay active if you use anime tags, as it makes prompting more flexible and reduces bias. activation_tags is important, set it to 1 if you added one during the dataset part of the guide. This is also called keep_tokens. ▶️ Steps: We need to pay attention here. There are 4 variables at play: your number of images, the number of repeats, the number of epochs, and the batch size. These result in your total steps. You can choose to set the total epochs or the total steps, we will look at some examples in a moment. Too few steps will undercook the Lora and make it useless, and too many will overcook it and distort your images. This is why we choose to save the Lora every few epochs, so we can compare and decide later. For this reason, I recommend few repeats and many epochs. There are many ways to train a Lora. The method I personally follow focuses on balancing the epochs, such that I can choose between 10 and 20 epochs depending on if I want a fast cook or a slow simmer (which is better for styles). Also, I have found that more images generally need more steps to stabilize. Thanks to the new min_snr_gamma option, Loras take less epochs to train. Here are some healthy values for you to try: 10 images × 10 repeats × 20 epochs ÷ 2 batch size = 1000 steps 20 images × 10 repeats × 10 epochs ÷ 2 batch size = 1000 steps 100 images × 3 repeats × 10 epochs ÷ 2 batch size = 1500 steps 400 images × 1 repeat × 10 epochs ÷ 2 batch size = 2000 steps 1000 images × 1 repeat × 10 epochs ÷ 3 batch size = 3300 steps ▶️ Learning: The most important settings. However, you don't need to change any of these your first time. In any case: The unet learning rate dictates how fast your Lora will absorb information. Like with steps, if it's too small the Lora won't do anything, and if it's too large the Lora will deepfry every image you generate. There's a flexible range of working values, specially since you can change the intensity of the lora in prompts. Assuming you set dim between 8 and 32 (see below), I recommend 5e-4 unet for almost all situations. If you want a slow simmer, 1e-4 or 2e-4 will be better. Note that these are in scientific notation: 1e-4 = 0.0001 The text encoder learning rate is less important, specially for styles. It helps learn tags better, but it'll still learn them without it. It is generally accepted that it should be either half or a fifth of the unet, good values include 1e-4 or 5e-5. Use google as a calculator if you find these small values confusing. The scheduler guides the learning rate over time. This is not critical, but still helps. I always use cosine with 3 restarts, which I personally feel like it keeps the Lora "fresh". Feel free to experiment with cosine, constant, and constant with warmup. Can't go wrong with those. There's also the warmup ratio which should help the training start efficiently, and the default of 5% works well. ▶️ Structure: Here is where you choose the type of Lora from the 2 I mentioned in the beginning. Also, the dim/alpha mean the size of your Lora. Larger does not usually mean better. I personally use 16/8 which works great for characters and is only 18 MB. ▶️ Ready: Now you're ready to run this big cell which will train your Lora. It will take 5 minutes to boot up, after which it starts performing the training steps. In total it should be less than an hour, and it will put the results in your Google Drive. 🏁 Third Half: Testing You read that right. I lied! 😈 There are 3 parts to this guide. When you finish your Lora you still have to test it to know if it's good. Go to your Google Drive inside the /lora_training/outputs/ folder, and download everything inside your project name's folder. Each of these is a different Lora saved at different epochs of your training. Each of them has a number like 01, 02, 03, etc. Here's a simple workflow to find the optimal way to use your Lora: Put your final Lora in your prompt with a weight of 0.7 or 1, and include some of the most common tags you saw during the tagging part of the guide. You should see a clear effect, hopefully similar to what you tried to train. Adjust your prompt until you're either satisfied or can't seem to get it any better. Use the X/Y/Z plot to compare different epochs. This is a builtin feature in webui. Go to the bottom of the generation parameters and select the script. Put the Lora of the first epoch in your prompt (like "<lora:projectname-01:0.7>"), and on the script's X value write something like "-01, -02, -03", etc. Make sure the X value is in "Prompt S/R" mode. These will perform replacements in your prompt, causing it to go through the different numbers of your lora so you can compare their quality. You can first compare every 2nd or every 5th epoch if you want to save time. You should ideally do batches of images to compare more fairly. Once you've found your favorite epoch, try to find the best weight. Do an X/Y/Z plot again, this time with an X value like ":0.5, :0.6, :0.7, :0.8, :0.9, :1". It will replace a small part of your prompt to go over different lora weights. Again it's better to compare in batches. You're looking for a weight that results in the best detail but without distorting the image. If you want you can do steps 2 and 3 together as X/Y, it'll take longer but be more thorough. If you found results you liked, congratulations! Keep testing different situations, angles, clothes, etc, to see if your Lora can be creative and do things that weren't in the training data. source: civitai/holostrawberry
Area Composition

Area Composition

Get more specific generations each time! Have you ever heard of Area composition? Area composition is a technique where you can specify and set custom locations for every element you want to generate. In order to create this simple but effective workflow all you need is: Nodes Load checkpoint: here you select your desired model. Load LoRA: here you select your desired style with any LoRA (this one is optional). Clip Set Last Layer: this node works as your Clip Skip (set it to -2 for better results). Clip text encode: here is where your lovely prompt will be. you will need to have two of these because one will work as your positives and the other as negatives. Ksampler: this node is important because it is like the brain of the main process. here is where your prompt and image size gets read it and transformed into an image. here you can use the sampler and scheduler you like the most (set the denoise strength to 1.0 for better results). Empty latent image: as important as the ksampler, the empty latent image node is where you decide the specific size of your initial image (can be portrait or landscape). Clip text encode: wait, again? yes. just as the last ones, this node will focus on the specific element you want to generate. it is important to keep it simple and only consider the main element to represent (you can have as many nodes for every element you want to generate. keep in mind that these nodes will only work as positives. for this example i will only use 2 clip text encode nodes). MultiArea conditioning: ok so, this is the most important node of the process. here, for explaining purposes, i will call each one of my positives as conditionings. conditioning 0 will be my first positive (the one i made on step 4). conditioning 1 and 2 will be my second and third positive (the one i made on step 7). it is very important to know that for each conditioning you will have to set a desired size for each element. in this example conditioning 0 i set it to 512x718 because is the base prompt and i want all of the canvas to represent it. for conditioning 1, which is my main character, i set it to 384x576 on lower part of the center of the canvas. and for conditioning 2, which is the background /setting, i set it to 512x718 because i want all of the canvas to work as the background. (you may notice that for each conditioning, while setting it's position, a different color will show on the multiarea conditioning node. keep calm, these colors will work just as a visual representation for the position of each element). also important, as you have figured it out, this node works just as a super detailed composition instruction, therefore, this multiarea conditioning node will work as your positive, so be sure to connect it as positive in your ksampler. Upscale latent: until this part of the process we have only created the base image, which means it is time to upscale it. to do so, i have used the upscale latent node. it not only upscale the image to a desired size but also introduces more detail in the process. Ksampler: yes, again. this second ksampler will work along the upscale latent node in order to refine details, so using the same configuration as your first one (step 5) is a good idea. (lowering the denoise strength on this second ksampler will help in avoiding drastic changes. for this example i set it to 0.5). VAE encode: the variational autoencoder or vae node is important because this node will transform the noise and commands into your beautiful masterpiece. Preview/Save image: lastly, what is left to add is the preview/save image node. (this one does not need an explanation, right?). And there you go, you will now be able to generate more personalized images. Intended image to create: cyborg girl inside abandoned building. Do not forget to set this article as favorite if you found it useful. Happy generations!

Posts