Philosophy.AI's Articles

Stylistic QR Code with Stable Diffusion

source: anfu.me (now you can easyly create QRcode with tensor.art inside controlnet, next time i will create guide about that)Yesterday, I created this image using Stable Diffusion and ControlNet, and shared on Twitter and Instagram – an illustration that also functions as a scannable QR code.The process of creating it was super fun, and I’m quite satisfied with the outcome.In this post, I would like to share some insights into my learning journey and the approaches I adopted to create this image. Additionally, I want to take this opportunity to credit the remarkable tools and models that made this project possible.Get into the Stable DiffusionThis year has witnessed an explosion of mind-boggling AI technologies, such as ChatGPT, DALL-E, Midjourney, Stable Diffusion, and many more. As a former photographer also with some interest in design and art, being able to generate images directly from imagination in minutes is undeniably tempting.So I started by trying Midjourney, it’s super easy to use, very expressive, and the quality is actually pretty good. It would honestly be my recommendation for anyone who wants to get started with generative AI art.By the way, Inès has also delved into it and become quite good at it now, go check her work on her new Instagram account  @a.i.nes.On my end, being a programmer with strong preferences, I would naturally seek for greater control over the process. This brought me to the realm of Stable Diffusion. I started with this guide: Stable Diffusion LoRA Models: A Complete Guide. The benefit of being late to the party is that there are already a lot of tools and guides ready to use. Setting up the environment quite straightforward and luckily my M1 Max’s GPU is supported.QR Code ImageA few weeks ago, nhciao on reddit posted a series of artistic QR codes created using Stable Diffusion and ControlNet. The concept behind them fascinated me, and I defintely want to make one for my own. So I did some research and managed to find the original article in Chinese: Use AI to Generate Scannable Images. The author provided insights into their motivations and the process of training the model, although they did not release the model itself. On the other hand, they are building a service called QRBTF.AI to generate such QR code, however it is not yet available.Until another day I found an community model QR Pattern Controlnet Model on CivitAI. I know I got to give it a try!SetupMy goal was to generate a QR code image that directs to my website while elements that reflect my interests. I ended up taking a slightly cypherpunk style with a character representing myself :PDisclaimer: I’m certainly far from being an expert in AI or related fields. In this post, I’m simply sharing what I’ve learned and the process I followed. My understanding may not be entirely accurate, and there are likely optimizations that could simplify the process. If you have any suggestions or comments, please feel free to reach out using the links at the bottom of the page. Thank you!1. Setup EnvironmentI pretty much follows Stable Diffusion LoRA Models: A Complete Guide to install the web ui AUTOMATIC1111/stable-diffusion-webui, download models you are interested in from CivitAI, etc. As a side note, I found that the user experience of the web ui is not super friendly, some of them I guess are a bit architectural issues that might not be easy to improve, but luckily I found a pretty nice theme canisminor1990/sd-webui-kitchen-theme that improves a bunch of small things.In order to use ControlNet, you will also need to install the Mikubill/sd-webui-controlnet extension for the web ui.Then you can download the QR Pattern Controlnet Model, putt the two files (.safetensors and .yaml) under stable-diffusion-webui/models/ControlNet folder, and restart the web ui.2. Create a QR CodeThere are hundreds of QR Code generators full of adds or paid services, and we certainly don’t need those fanciness – because we are going to make it much more fancier 😝!So I end up found the QR Code Generator Library, a playground of an open source QR Code generator. It’s simple but exactly what I need! It’s better to use medium error correction level or above to make it more easy recognizable later. Small tip that you can try with different Mask pattern to find a better color destribution that fits your design.3. Text to ImageAs the regular Text2Image workflow, we need to provide some prompts for the AI to generate the image from. Here is the prompts I used:Prompts(one male engineer), medium curly hair, from side, (mechanics), circuit board, steampunk, machine, studio, table, science fiction, high contrast, high key, cinematic light, (masterpiece, top quality, best quality, official art, beautiful and aesthetic:1.3), extreme detailed, highest detailed, (ultra-detailed)Negative Prompts(worst quality, low quality:2), overexposure, watermark, text, easynegative, ugly, (blurry:2), bad_prompt,bad-artist, bad hand, ng_deepnegative_v1_75tThen we need to go the ControlNet section, and upload the QR code image we generated earlier. And configure the parameters as suggested in the model homepage.Then you can start to generate a few images and see if it met your expectations. You will also need to check if the generated image is scannable, if not, you can tweak the Start controling step and End controling step to find a good balance between stylization and QRCode-likeness.4. I’m feeling lucky!After finding a set of parameters that I am happy with, I will increase the Batch Count to around 100 and let the model generate variations randomly. Later I can go through them and pick one with the best conposition and details for further refinement. This can take a lot of time, and also a lot of resources from your processors. So I usually start it before going to bed and leave it overnight.Here are some examples of the generated variations (not all of them are scannable):From approximately one hundred variations, I ultimately chose the following image as the starting point:It gets pretty interesting composition, while being less obvious as a QR code. So I decided to proceed with it and add add a bit more details. (You can compare it with the final result to see the changes I made.)5. Refining DetailsUpdate: I recently built a toolkit to help with this process, check my new blog post 👉 Refine AI Generated QR Code for more details.The generated images from the model are not perfect in every detail. For instance, you may have noticed that the hand and face appear slightly distorted, and the three anchor boxes in the corner are less visually appealing. We can use the inpaint feature to tell the model to redraw some parts of the image (it would better if you keep the same or similiar prompts as the original generation).Inpainting typically requires a similar amount of time as generating a text-to-image, and it involves either luck or patience. Often, I utilize Photoshop to "borrow" some parts from previously generated images and utilize the spot healing brush tool to clean up glitches and artifacts. My Photoshop layers would looks like this:After making these adjustments, I’ll send the combined image back for inpainting again to ensure a more seamless blend. Or to search for some other components that I didn’t found in other images.Specifically on the QR Code, in some cases ControlNet may not have enough prioritize, causing the prompts to take over and result in certain parts of the QR Code not matching. To address this, I would overlay the original QR Code image onto the generated image (as shown in the left image below), identify any mismatches, and use a brush tool to paint those parts with the correct colors (as shown in the right image below).I then export the marked image for inpainting once again, adjusting the Denoising strength to approximately 0.7. This would ensures that the model overrides our marks while still respecting the color to some degree.Ultimately, I iterate through this process multiple times until I am satisfied with every detail.6. UpscalingThe recommended generation size is 920x920 pixels. However, the model does not always generate highly detailed results at the pixel level. As a result, details like the face and hands can appear blurry when they are too small. To overcome this, we can upscale the image, providing the model with more pixels to work with. The SD Upscaler script in the img2img tab is particularly effective for this purpose. You can refer to the guide Upscale Images With Stable Diffusion for more information.7. Post-processingLastly, I use Photoshop and Lightroom for subtle color grading and post-processing, and we are done!The one I end up with not very good error tolerance, you might need to try a few times or use a more forgiving scanner to get it scanned :PAnd using the similarly process, I made another one for Inès:ConclusionCreating this image took me a full day, with a total of 10 hours of learning, generating, and refining. The process was incredibly enjoyable for me, and I am thrilled with the end result! I hope this post can offer you some fundamental concepts or inspire you to embark on your own creative journey. There is undoubtedly much more to explore in this field, and I eager to see what’s coming next!Join my Discord Server and let’s explore more together!If you want to learn more about the refining process, go check my new blog post: Refining AI Generated QR Code.ReferencesHere are the list of resources for easier reference.ConceptsStable DiffusionControlNetToolsHardwares & Softwares I am using.AUTOMATIC1111/stable-diffusion-webui - Web UI for Stable Diffusioncanisminor1990/sd-webui-kitchen-theme - Nice UI enhancementMikubill/sd-webui-controlnet - ControlNet extension for the webuiQR Code Generator Library - QR code generator that is ad-free and customisableAdobe Photoshop - The tool I used to blend the QR code and the illustrationModelsControl Net Models for QR Code (you can pick one of them)QR Pattern Controlnet ModelControlnet QR Code MonsterIoC Lab Control NetCheckpoint Model (you can use any checkpoints you like)Ghostmix Checkpoint - A very high quality checkpoint I use. You can use any other checkpoints you likeTutorialsStable Diffusion LoRA Models: A Complete Guide - The one I used to get started(Chinese) Use AI to genereate scannable images - Unfortunately the article is in Chinese and I didn’t find a English version of it.Upscale Images With Stable Diffusion - Enlarge the image while adding more details

Philosophy.AI

[Guide] Make your own Loras, easy and free

This article helped me to create my first Lora and upload it to Tensor.art, although Tensor.art has its own Lora Train , this article helps to understand how to create Lora well.🏭 PreambleEven if you don't know where to start or don't have a powerful computer, I can guide you to making your first Lora and more!In this guide we'll be using resources from my GitHub page. If you're new to Stable Diffusion I also have a full guide to generate your own images and learn useful tools.I'm making this guide for the joy it brings me to share my hobbies and the work I put into them. I believe all information should be free for everyone, including image generation software. However I do not support you if you want to use AI to trick people, scam people, or break the law. I just do it for fun.Also here's a page where I collect Hololive loras.📃What you needAn internet connection. You can even do this from your phone if you want to (as long as you can prevent the tab from closing).Knowledge about what Loras are and how to use them.Patience. I'll try to explain these new concepts in an easy way. Just try to read carefully, use critical thinking, and don't give up if you encounter errors.🎴Making a Lorat has a reputation for being difficult. So many options and nobody explains what any of them do. Well, I've streamlined the process such that anyone can make their own Lora starting from nothing in under an hour. All while keeping some advanced settings you can use later on.You could of course train a Lora in your own computer, granted that you have an Nvidia graphics card with 6 GB of VRAM or more. We won't be doing that in this guide though, we'll be using Google Colab, which lets you borrow Google's powerful computers and graphics cards for free for a few hours a day (some say it's 20 hours a week). You can also pay $10 to get up to 50 extra hours, but you don't have to. We'll also be using a little bit of Google Drive storage.This guide focuses on anime, but it also works for photorealism. However I won't help you if you want to copy real people's faces without their consent.🎡 Types of LoraAs you may know, a Lora can be trained and used for:A character or personAn artstyleA poseA piece of clothingetcHowever there are also different types of Lora now:LoRA: The classic, works well for most cases.LoCon: Has more layers which learn more aspects of the training data. Very good for artstyles.LoHa, LoKR, (IA)^3: These use novel mathematical algorithms to process the training data. I won't cover them as I don't think they're very useful.📊 First Half: Making a DatasetThis is the longest and most important part of making a Lora. A dataset is (for us) a collection of images and their descriptions, where each pair has the same filename (eg. "1.png" and "1.txt"), and they all have something in common which you want the AI to learn. The quality of your dataset is essential: You want your images to have at least 2 examples of: poses, angles, backgrounds, clothes, etc. If all your images are face close-ups for example, your Lora will have a hard time generating full body shots (but it's still possible!), unless you add a couple examples of those. As you add more variety, the concept will be better understood, allowing the AI to create new things that weren't in the training data. For example a character may then be generated in new poses and in different clothes. You can train a mediocre Lora with a bare minimum of 5 images, but I recommend 20 or more, and up to 1000.As for the descriptions, for general images you want short and detailed sentences such as "full body photograph of a woman with blonde hair sitting on a chair". For anime you'll need to use booru tags (1girl, blonde hair, full body, on chair, etc.). Let me describe how tags work in your dataset: You need to be detailed, as the Lora will reference what's going on by using the base model you use for training. If there is something in all your images that you don't include in your tags, it will become part of your Lora. This is because the Lora absorbs details that can't be described easily with words, such as faces and accessories. Thanks to this you can let those details be absorbed into an activation tag, which is a unique word or phrase that goes at the start of every text file, and which makes your Lora easy to prompt.You may gather your images online, and describe them manually. But fortunately, you can do most of this process automatically using my new 📊 dataset maker colab.Here are the steps:1️⃣ Setup: This will connect to your Google Drive. Choose a simple name for your project, and a folder structure you like, then run the cell by clicking the floating play button to the left side. It will ask for permission, accept to continue the guide.If you already have images to train with, upload them to your Google Drive's "lora_training/datasets/project_name" (old) or "Loras/project_name/dataset" (new) folder, and you may choose to skip step 2.2️⃣ Scrape images from Gelbooru: In the case of anime, we will use the vast collection of available art to train our Lora. Gelbooru sorts images through thousands of booru tags describing everything about an image, which is also how we'll tag our images later. Follow the instructions on the colab for this step; basically, you want to request images that contain specific tags that represent your concept, character or style. When you run this cell it will show you the results and ask if you want to continue. Once you're satisfied, type yes and wait a minute for your images to download.3️⃣ Curate your images: There are a lot of duplicate images on Gelbooru, so we'll be using the FiftyOne AI to detect them and mark them for deletion. This will take a couple minutes once you run this cell. They won't be deleted yet though: eventually an interactive area will appear below the cell, displaying all your images in a grid. Here you can select the ones you don't like and mark them for deletion too. Follow the instructions in the colab. It is beneficial to delete low quality or unrelated images that slipped their way in. When you're finished, send Enter in the text box above the interactive area to apply your changes.4️⃣ Tag your images: We'll be using the WD 1.4 tagger AI to assign anime tags that describe your images, or the BLIP AI to create captions for photorealistic/other images. This takes a few minutes. I've found good results with a tagging threshold of 0.35 to 0.5. After running this cell it'll show you the most common tags in your dataset which will be useful for the next step.5️⃣ Curate your tags: This step for anime tags is optional, but very useful. Here you can assign the activation tag (also called trigger word) for your Lora. If you're training a style, you probably don't want any activation tag so that the Lora is always in effect. If you're training a character, I myself tend to delete (prune) common tags that are intrinsic to the character, such as body features and hair/eye color. This causes them to get absorbed by the activation tag. Pruning makes prompting with your Lora easier, but also less flexible. Some people like to prune all clothing to have a single tag that defines a character outfit; I do not recommend this, as too much pruning will affect some details. A more flexible approach is to merge tags, for example if we have some redundant tags like "striped shirt, vertical stripes, vertical-striped shirt" we can replace all of them with just "striped shirt". You can run this step as many times as you want.6️⃣ Ready: Your dataset is stored in your Google Drive. You can do anything you want with it, but we'll be going straight to the second half of this tutorial to start training your Lora!⭐ Second Half: Settings and TrainingThis is the tricky part. To train your Lora we'll use my ⭐ Lora trainer colab. It consists of a single cell with all the settings you need. Many of these settings don't need to be changed. However, this guide and the colab will explain what each of them do, such that you can play with them in the future.Here are the settings:▶️ Setup: Enter the same project name you used in the first half of the guide and it'll work automatically. Here you can also change the base model for training. There are 2 recommended default ones, but alternatively you can copy a direct download link to a custom model of your choice. Make sure to pick the same folder structure you used in the dataset maker.▶️ Processing: Here are the settings that change how your dataset will be processed.The resolution should stay at 512 this time, which is normal for Stable Diffusion. Increasing it makes training much slower, but it does help with finer details.flip_aug is a trick to learn more evenly, as if you had more images, but makes the AI confuse left and right, so it's your choice.shuffle_tags should always stay active if you use anime tags, as it makes prompting more flexible and reduces bias.activation_tags is important, set it to 1 if you added one during the dataset part of the guide. This is also called keep_tokens.▶️ Steps: We need to pay attention here. There are 4 variables at play: your number of images, the number of repeats, the number of epochs, and the batch size. These result in your total steps.You can choose to set the total epochs or the total steps, we will look at some examples in a moment. Too few steps will undercook the Lora and make it useless, and too many will overcook it and distort your images. This is why we choose to save the Lora every few epochs, so we can compare and decide later. For this reason, I recommend few repeats and many epochs.There are many ways to train a Lora. The method I personally follow focuses on balancing the epochs, such that I can choose between 10 and 20 epochs depending on if I want a fast cook or a slow simmer (which is better for styles). Also, I have found that more images generally need more steps to stabilize. Thanks to the new min_snr_gamma option, Loras take less epochs to train. Here are some healthy values for you to try:10 images × 10 repeats × 20 epochs ÷ 2 batch size = 1000 steps20 images × 10 repeats × 10 epochs ÷ 2 batch size = 1000 steps100 images × 3 repeats × 10 epochs ÷ 2 batch size = 1500 steps400 images × 1 repeat × 10 epochs ÷ 2 batch size = 2000 steps1000 images × 1 repeat × 10 epochs ÷ 3 batch size = 3300 steps▶️ Learning: The most important settings. However, you don't need to change any of these your first time. In any case:The unet learning rate dictates how fast your Lora will absorb information. Like with steps, if it's too small the Lora won't do anything, and if it's too large the Lora will deepfry every image you generate. There's a flexible range of working values, specially since you can change the intensity of the lora in prompts. Assuming you set dim between 8 and 32 (see below), I recommend 5e-4 unet for almost all situations. If you want a slow simmer, 1e-4 or 2e-4 will be better. Note that these are in scientific notation: 1e-4 = 0.0001The text encoder learning rate is less important, specially for styles. It helps learn tags better, but it'll still learn them without it. It is generally accepted that it should be either half or a fifth of the unet, good values include 1e-4 or 5e-5. Use google as a calculator if you find these small values confusing.The scheduler guides the learning rate over time. This is not critical, but still helps. I always use cosine with 3 restarts, which I personally feel like it keeps the Lora "fresh". Feel free to experiment with cosine, constant, and constant with warmup. Can't go wrong with those. There's also the warmup ratio which should help the training start efficiently, and the default of 5% works well.▶️ Structure: Here is where you choose the type of Lora from the 2 I mentioned in the beginning. Also, the dim/alpha mean the size of your Lora. Larger does not usually mean better. I personally use 16/8 which works great for characters and is only 18 MB.▶️ Ready: Now you're ready to run this big cell which will train your Lora. It will take 5 minutes to boot up, after which it starts performing the training steps. In total it should be less than an hour, and it will put the results in your Google Drive.🏁 Third Half: TestingYou read that right. I lied! 😈 There are 3 parts to this guide.When you finish your Lora you still have to test it to know if it's good. Go to your Google Drive inside the /lora_training/outputs/ folder, and download everything inside your project name's folder. Each of these is a different Lora saved at different epochs of your training. Each of them has a number like 01, 02, 03, etc.Here's a simple workflow to find the optimal way to use your Lora:Put your final Lora in your prompt with a weight of 0.7 or 1, and include some of the most common tags you saw during the tagging part of the guide. You should see a clear effect, hopefully similar to what you tried to train. Adjust your prompt until you're either satisfied or can't seem to get it any better.Use the X/Y/Z plot to compare different epochs. This is a builtin feature in webui. Go to the bottom of the generation parameters and select the script. Put the Lora of the first epoch in your prompt (like "<lora:projectname-01:0.7>"), and on the script's X value write something like "-01, -02, -03", etc. Make sure the X value is in "Prompt S/R" mode. These will perform replacements in your prompt, causing it to go through the different numbers of your lora so you can compare their quality. You can first compare every 2nd or every 5th epoch if you want to save time. You should ideally do batches of images to compare more fairly.Once you've found your favorite epoch, try to find the best weight. Do an X/Y/Z plot again, this time with an X value like ":0.5, :0.6, :0.7, :0.8, :0.9, :1". It will replace a small part of your prompt to go over different lora weights. Again it's better to compare in batches. You're looking for a weight that results in the best detail but without distorting the image. If you want you can do steps 2 and 3 together as X/Y, it'll take longer but be more thorough.If you found results you liked, congratulations! Keep testing different situations, angles, clothes, etc, to see if your Lora can be creative and do things that weren't in the training data.source: civitai/holostrawberry

Philosophy.AI

Philosophy.AI

HOW TO CREATE LORA JUST UNDER 100 CREDIT (FLUX / SDXL / PONY / HUNYUAN DIT) BY EXPERIMENT BASED.

Quickstart Guide to Stable Video Diffusion

How to use nodes

Best settings for Stable Diffusion SDXL

Stylistic QR Code with Stable Diffusion

[Guide] Make your own Loras, easy and free

Here are three interesting facts about Stable Diffusion 3, the latest open-source AI model

Creating LoRA for AI Art: 4 Essential Preparations

Artist Inspired Styles for Stable Diffusion and Their Examples

How to Create Effective Prompts for AI Image Generation