Tensor.Art
Create
PictureT

PictureT

TA1year
Daily Challenge
SD3
馃嚞馃嚙 馃嚚馃嚨 馃嚜馃嚫 Mostly in 饾懛饾拪饾挋饾拪饾挆 馃寪 https://picture-t.com/
50
Followers
53
Following
15.2K
Runs
0
Downloads
3.9K
Likes

AI Tools

View All

Articles

View All
Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

馃摑 - Synthical The Dynamics of Negative Prompts in AI: A Comprehensive Study by: Yuanhao Ban UCLA, Ruochen Wang UCLA, Tianyi Zhou UMD, Minhao Cheng PSU, Boqing Gong, Cho-Jui Hsieh UCLAE This study addresses the gap in understanding the impact of negative prompts in AI diffusion models. By focusing on the dynamics of diffusion steps, the research aims to answer the question: "When and how do negative prompts take effect?". The investigation categorizes the mechanism of negative prompts into two primary tasks: noun-based removal and adjective-based alteration. The role of prompts in AI diffusion models is crucial for guiding the generation process. Negative prompts, which instruct the model to avoid generating certain features, have been less studied compared to their positive counterparts. This study provides a detailed analysis of negative prompts, identifying the critical steps at which they begin to influence the image generation process. Findings Critical Steps for Negative Prompts Noun-Based Removal: The influence of noun-based negative prompts peaks at the 5th diffusion step. At this critical step, negative prompts initially generate a target object at a specific location within the image. This neutralizes the positive noise through a subtractive process, effectively erasing the object. However, introducing a negative prompt in the early stages paradoxically results in the generation of the specified object. Therefore, the optimal timing for introducing these prompts is after the critical step. Adjective-Based Alteration: The influence of adjective-based negative prompts peaks around the 10th diffusion step. During the initial stages, the absence of the object leads to a subdued response. Between the 5th and 10th steps, as the object becomes clearer, the negative prompt accurately focuses on the intended area and maintains its influence. Cross-Attention Dynamics At the peak around the 5th step for noun-based prompts, the negative prompt attempts to generate objects in the middle of the image, regardless of the positive prompt's context. As this process approaches its peak, the negative prompt begins to assimilate layout cues from its positive counterpart, trying to remove the object. This represents the zenith of its influence. For adjective-based prompts, during the peak around the 10th step, the negative prompt maintains its influence on the intended area, accurately targeting the object as it becomes clear. The study highlights the paradoxical effect of introducing negative prompts in the early stages of diffusion, leading to the unintended generation of the specified object. This finding suggests that the timing of negative prompt introduction is crucial for achieving the desired outcome. Reverse Activation Phenomenon A significant phenomenon observed in the study is Reverse Activation. This occurs when a negative prompt, introduced early in the diffusion process, unexpectedly leads to the generation of the specified object within the context of that negative prompt. To explain this, researchers borrowed the concept of the energy function from Energy-Based Models to represent data distribution. Real-world distributions often feature elements like clear blue skies or uniform backgrounds, alongside distinct objects such as the Eiffel Tower. These elements typically possess low energy scores, making the model inclined to generate them. The energy function is designed to assign lower energy levels to more 'likely' or 'natural' images according to the model鈥檚 training data, and higher energy levels to less likely ones. A positive difference indicates that the presence of the negative prompt effectively induces the inclusion of this component in the positive noise. The presence of a negative prompt promotes the formation of the object within the positive noise. Without the negative prompt, implicit guidance is insufficient to generate the intended object. The application of a negative prompt intensifies the distribution guidance towards the object, preventing it from materializing. As a result, negative prompts typically do not attend to the correct place until step 5, well after the application of positive prompts. The use of negative prompts in the initial steps can significantly skew the diffusion process, potentially altering the background. Conclusions Do not step less than 10th times, going beyond 25th times does not make the difference for negative prompting. Negative prompts could enhance your positive prompts, depending on how well the model and LoRA have learn their keywords, so they could be understood as an extension of their counterparts. Weighting-up negative keywords may cause reverse activation, breaking up your image, try keeping the ratio influence of all your LoRAs and models equals. Reference https://synthical.com/article/Understanding-the-Impact-of-Negative-Prompts%3A-When-and-How-Do-They-Take-Effect%3F-171ebba1-5ca7-410e-8cf9-c8b8c98d37b6?
Stable Diffusion [Floating Point, Performance in the Cloud]

Stable Diffusion [Floating Point, Performance in the Cloud]

Overview of Data Formats used in AI fp32 is the default data format used for training, along with mixed-precision training that uses both fp32 and fp16. fp32 has more than adequate scale and definition to effectively train the most complex neural networks. It also results in large models both in terms of parameter size and complexity. fp16 data format both in hardware and software with good performance. In running AI inference workloads, the adoption of fp16 instead of the mainstream fp32 offers tremendous advantages in terms of speed-up while reducing power consumption and memory footprint. This advantage comes with virtually no accuracy loss. The switch to fp16 is completely seamless and does not require any major code changes or fine-tuning. CPUs will improve their AI inference workload performance instantly. Overview of Data Formats used in AI fp32 is the default data format used for training, along with mixed-precision training that uses both fp32 and fp16. fp32 has more than adequate scale and definition to effectively train the most complex neural networks. fp32 can represent numbers between 10鈦烩伌鈦 and 10鲁鈦. In most cases, such a wide range is wasteful and does not bring additional precision. The use of fp16 reduces this range to 10鈦烩伕 and 65,504 and cuts in half the memory requirements while also accelerating the training and inference speeds. Make sure to avoid under and overflow situations. Once the training is completed, one of the most popular ways to improve performance is to quantize the network. A popular data format used in this process, mainly in edge applications is int8 and results in at most a 4x reduction in size with a notable performance improvement. However, quantization into int8 frequently leads to some accuracy loss. Sometimes, the loss is limited to a fraction of a percent but often results in a few percent of degradation, and in many applications, this degradation becomes unacceptable. There are ways to limit accuracy loss by doing quantization-aware training. This consists of introducing the int8 data format selectively and/or progressively during training. It is also possible to apply quantization to the weights while keeping activation functions at fp32 resolution. Though these methods will help limit the accuracy loss, they will not eliminate it altogether. fp16 is a data format that can be the right solution for preventing accuracy loss while requiring minimal or no conversion effort. Indeed, it has been observed in many benchmarks that the transition from fp32 to fp16 results in no noticeable accuracy without any re-training. Conclusion For NVIDIA GPUs and AI, deploy in fp16 to double inference speeds while reducing the memory, footprint and power consumption. Note: If the original model was not trained using fp16, its conversion to fp16 is extremely easy and does not require re-training or code changes. It is also shown that the switch to fp16 led to no visible accuracy loss in most cases. Source: https://amperecomputing.com/
Blender to Stabble Diffusion, animation workflow.

Blender to Stabble Diffusion, animation workflow.

Source: https://www.youtube.com/watch?v=8afb3luBvD8 Mickmumpitz guides us on how to use Stable Diffusion, a neural network-based interface, to generate masks and prompts for rendering 3D animations. The process involves setting up passes in Blender, creating a file output node, and then using Stable Diffusion's node-based interface for image workflow. Overall, the video demonstrates how to use these AI tools to enhance the rendering process of 3D animations. The process involves setting up render passes, such as depth and normal passes, in Blender to extract information from the 3D scene for AI image generation. Users can create mask passes to communicate which prompts to use for individual objects in the scene. Stable Diffusion, a neural network-based interface, is used to generate masks and prompts for rendering. Mickmumpitz tell us the differences between using Stable Diffusion and SDXL for image generation and video rendering, highlighting the advantages and disadvantages of each, demonstrating how to use Stable Diffusion 1.5 in Blender to generate specific styles and control the level of detail in the AI-generated scenes. Mickmumpitz shows an updated workflow for rendering 3D animations using AI with Blender and Stable Diffusion. He created simplistic scenes, including a futuristic cityscape and a rope balancing scene, to test the updated version. The workflow uses render passes, such as depth and normal passes, to extract information from the 3D scene for AI image generation. The speaker also explains how to create mask passes to communicate which prompts to use for individual objects in the scene. The workflow aims to make rendering more efficient and versatile.
Stable Diffusion [ADetailer]

Stable Diffusion [ADetailer]

After Detailer (ADetailer) After Detailer (ADetailer) is a game-changing extension designed to simplify the process of image enhancement, particularly inpainting. This tool saves you time and proves invaluable in fixing common issues, such as distorted faces in your generated images. Historically we would send the image to an inpainting tool and manually draw a mask around the problematic face area. After Detailer streamlines this process by automating it with the help of a face recognition model. It detects faces and automatically generates the inpaint mask, then proceeds with inpainting by itself. Exploring ADetailer Parameters Now that you've grasped the basics, let's delve into additional parameters that allow fine-tuning of ADetailer's functionality. Detection Model: ADetailer offers various detection models, such as face_xxxx, hand_xxxx, and person_xxxx, catering to specific needs. Notably, face_yolo and person_yolo models, based on YOLO (You Only Look Once), excel at detecting faces and objects, yielding excellent inpainting results. Model Selection: The "8n" and "8s" models vary in speed and power, with "8n" being faster and smaller. Choose the model that suits your detection needs, switching to "8s" if detection proves challenging. ADetailer Prompting Input your prompts and negatives in the ADetailer section to achieve desired results. Detection Model Confidence Threshold: This threshold determines the minimum confidence score needed for model detections. Lower values (e.g., 0.3) are advisable for detecting faces. Adjust as necessary to improve or reduce detections. Mask Min/Max Area Ratio: These parameters control the allowed size range for detected masks. Modifying the minimum area ratio can help filter out undesired small objects. The most crucial setting in the Inpainting section is the "Inpaint denoising strength," which determines the level of denoising applied during automatic inpainting. Adjust it to achieve your desired degree of change. In most cases, selecting "Inpaint only masked" is recommended when inpainting faces. Reference ThinkDiffusion
TagGUI - captioning tool for model creators

TagGUI - captioning tool for model creators

馃摜 Download | https://github.com/jhc13/taggui Cross-platform desktop application for quickly adding and editing image tags and captions, aimed towards creators of image datasets for generative AI models like Stable Diffusion. Features Keyboard-friendly interface for fast tagging Tag autocomplete based on your own most-used tags Integrated Stable Diffusion token counter Automatic caption and tag generation with models including CogVLM, LLaVA, WD Tagger, and many more Batch tag operations for renaming, deleting, and sorting tags Advanced image list filtering Captioning parameters Prompt: Instructions given to the captioning model. Prompt formats are handled automatically based on the selected model. You can use the following template variables to dynamically insert information about each image into the prompt: {tags}: The tags of the image, separated by commas. {name}: The file name of the image without the extension. {directory} or {folder}: The name of the directory containing the image. An example prompt using a template variable could be Describe the image using the following tags as context: {tags}. With this prompt, {tags} would be replaced with the existing tags of each image before the prompt is sent to the model. Start caption with: Generated captions will start with this text. Remove tag separators in caption: If checked, tag separators (commas by default) will be removed from the generated captions. Discourage from caption: Words or phrases that should not be present in the generated captions. You can separate multiple words or phrases with commas (,). For example, you can put appears,seems,possibly to prevent the model from using an uncertain tone in the captions. The words may still be generated due to limitations related to tokenization. Include in caption: Words or phrases that should be present somewhere in the generated captions. You can separate multiple words or phrases with commas (,). You can also allow the captioning model to choose from a group of words or phrases by separating them with |. For example, if you put cat,orange|white|black, the model will attempt to generate captions that contain the word cat and either orange, white, or black. It is not guaranteed that all of your specifications will be met. Tags to exclude (WD Tagger models): Tags that should not be generated, separated by commas. Many of the other generation parameters are described in the Hugging Face documentation.
Stable Diffusion [Parameters]

Stable Diffusion [Parameters]

Stable DIfusion Intro. Stable Diffusion is an open-source text-to-image AI model that can generate amazing images from given text in seconds. The model was trained on images in the LAION-5B dataset (Large-scale Artificial Intelligence Open Network). It was developed by CompVis, Stable AI and RunwayML. All research artifacts from Stability AI are intended to be open sourced. Promp Engineering. Prompt Engineering is the process of structuring words that can be interpreted and understood by a text-to-image model. Is the language you need to speak in order to tell an AI model what to draw. A well-written prompt consisting of keywords and good sentence structure. Ask yourself a list of questions once you have in mind something. Do you want a photo or a painting, digital art? What鈥檚 the subject: a person, an animal the painting itself? What details are part of your idea? Special lighting: soft, ambient, etc. Environment: indoor, outdoor, etc. Colo scheme: vibrant, muted, etc. Shot: front, from behind, etc. Background: solid color, forest, etc. What style: illustration, 3D render, movie poster? The order of words is important. The order and presentation of our desired output is almost as an important aspect as the vocabulary itself. It is recommended to list your concepts explicitly and separately than trying to cramp it into one simple sentence. Keywords and Sub-Keywords. Keywords are words that can change the style, format, or perspective of the image. There are certain magic words or phrases that are proven to boost the quality of the image. sub-keywords are those who belong to the semantic group of keywords; hierarchy is important for prompting as well for LoRAS or Models design. Classifier Free Guidance (CFG default is 7) You can understand this parameter as 鈥淎i Creativity vs {{user}} prompt鈥. Lower numbers give Ai more freedom to be creative, while higher numbers force it to stick to the prompt. CFG {2, 6}: if you鈥檙e discovering, testing or researching for heavy Ai influence. CFG {7, 10}: if you have a solid prompt but you still want some creativity. CFG {10, 15}: if your prompt is solid enough and you do not want Ai disturbs your idea. CFG {16, 20}: Not recommended, uncoherency. Step Count Stable Diffusion creates an image by starting with a canvas full of noise and denoise it gradually to reach the final output, this parameter controls the number of these denoising steps. Usually, higher is better but to a certain degree, for beginners it鈥檚 recommended to stick with the default. Seed Seed is a number that controls the initial noise. The seed is the reason that you get a different image each time you generate when all the parameters are fixed. By default, on most implementations of Stable Diffusion, the seed automatically changes every time you generate an image. You can get the same result back if you keep the prompt, the seed and all other parameters the same. 鈿狅笍 Seeding is important for your creations, so try to save a good seed and slightly tweak the prompt to get what you鈥檙e looking for while keeping the same composition. Sampler Diffusion samplers are the method used to denoise the image during generation, they take different durations and different number of steps to reach a usable image. This parameter affects the step count significantly; a refined one could reduce or increase the step count giving more or less subjective detail. CLIP Skip First of all we need to know what CLIP is. CLIP, which stands for Contrastive Language Image Pretraining is a multi-modal model trained on 400 million (image, text) pairs. During the training process, a text and image encoder are jointly trained to predict which caption goes with which image as shown in the diagram below. Just think on this like the size like a funnel which uses SD to comb obtained information from its dataset; big numbers result in many information to process, so the final image is not presize. Lower numbers narrow down the captions on the dataset, so you'd get more accurated results. Clip Skip {1}: Strong concidences and less liberty. Clip Skip {2}: Nicer concidences and few liberty. Clip Skip {3-5}: Many concidences and high liberty. Clip Skip {6}: Unexpeted results. ENSD (Eta Noise Seed Delta) Its like a slider for the seed parameter; you can get different image results for a fixed seed number. So... what is the optimal number? There is not. Just use your lucky number, you're ponting the seeding to this number. If you are using a random seed every time, ENSD is irrelevant. So why people use 31337 commonly? Known as eleet or leetspeak, is a system of modified spellings used primarily on the Internet. Its a cabalistic number, its safe using any other number. References Automatic1111 OpenArt Prompt Book LAION LAION-5B Paper 1337
Stable Diffusion [Weight Syntax]

Stable Diffusion [Weight Syntax]

Weight (Individual CFG for keywords) : Colon stablish weight slider on keywords changing its default value (1.00 = default = x). ( ) Round brackets, for modifying keyword鈥檚 value, example (red) means red:1.10 (keyword) means (x+0.1x), if x=1 鈬 (1+1(0.1)) = 1.10 ((keyword)) means (x+0.1x)虏, if x=1 鈬 (1+0.1))虏 = 1.21 (((keyword))) means (x+0.1x)鲁, if x=1 鈬 (1+0.1))鲁 = 1.33 ((((keyword)))) means (x+0.1x)鈦, if x=1 鈬 (1+0.1))鈦 = 1.46 + Plus, for modifying keyword鈥檚 value, example red+ means red:1.10 keyword+ means (x+0.1x), if x=1 鈬 (1+1(0.1)) = 1.10 keyword++ means (x+0.1x)虏, if x=1 鈬 (1+0.1))虏 = 1.21 keyword+++ means (x+0.1x)鲁, if x=1 鈬 (1+0.1))鲁 = 1.33 keyword++++ means (x+0.1x)鈦, if x=1 鈬 (1+0.1))鈦 = 1.46 鈥 etc [ ] Square Bracket, for modifying keyword鈥檚 value, example [red] means red:0.90 [keyword] means (x+0.1x), if x=1 鈬 (1-1(0.1)) = 0.90 [[keyword]] means (x+0.1x)虏, if x=1 鈬 (1-0.1))虏 = 0.81 [[[keyword]]] means (x+0.1x)鲁, if x=1 鈬 (1-0.1))鲁 = 0.72 [[[[keyword]]]] means (x+0.1x)鈦, if x=1 鈬 (1+0.1))鈦 = 0.65 鈥 etc - Minus, for modifying keyword鈥檚 value, example red+ means red:0.90 keyword- means (x+0.1x), if x=1 鈬 (1-1(0.1)) = 0.90 keyword-- means (x+0.1x)虏, if x=1 鈬 (1-0.1))虏 = 0.81 keyword--- means (x+0.1x)鲁, if x=1 鈬 (1-0.1))鲁 = 0.72 keyword---- means (x+0.1x)鈦, if x=1 鈬 (1+0.1))鈦 = 0.65 鈥 etc In theory you can combine, or even bypass the limit values (0.00 - 2.00) with the correct script or modification in your dashboard.

Posts