YouTube to Text Converter

Transcript of ComfyUI Tutorial Series Ep 64 Nunchaku Qwen Image Edit 2509

Video Transcript:

Welcome to episode 64 of our Comfy UI tutorial series. Today I will talk about Quen ImageEdit 25509, the newest and most advanced version of Quen's image editing tool, which not only allows you to edit multiple images at the same time, but also does a much better job of keeping faces, products, and text consistent while adding powerful support for control inputs like depth maps and pose maps, so your edits can be even more detailed, accurate, and realistic. and I will show you the nunchaku version which is faster so you can edit your images even quicker in just four steps. I made a new channel on discord especially for comfy UI easy installer so Ivo can post the updates he makes for the easy installer and it will be easier for you to find them. Here you will also find how to update the nunchaku version when there is a new update so you don't have to reinstall confi UI each time. However, today I will do a fresh install because he changed the Python version to 3.12 and added some new files in the add-ons folder and the installation should be even faster. Now, let me show you how you can do it. So, on the GitHub page, there is the installer. All you have to do is scroll down, read the instructions, and you will find the installer there. Click to download, and you can see I already have a few versions installed, so it doesn't interfere with the existing Comfy UI. I will just make a new folder. Give it a name that makes sense to you and save that zip archive there. Now I will navigate to that folder, rightclick and extract the archive contents. You can delete the zip archive after. Now we have all we need to install. I will copy it. You don't have to, but I like to have fewer folders inside other folders. So I am moving the files up one folder and deleting the original folder. This way it will install Comfy UI here in this folder instead of having a folder inside another folder with same name. Open this file to start the installer. If it asks you to update or install Git, say yes and then you can take a break for 5 minutes. When you come back, it will say installation complete and it took 367 seconds on my computer. Press any key to exit. Now, this is the folder I talked about earlier. the one where I didn't want a folder inside a folder with the same easy install name. So now here we have a working comfy UI, but to be able to run nunchaku workflows, I need to do some extra steps. Go to the add-ons folder and you will see some optional bat files here that you can install if you need them. I will install Sage Attention because it speeds up some models, so it is good to have and the installation only takes a few seconds. Then for Nunchaku, there are two options. I will choose the latest version. If I run that bat, you can see what version of Nunchaku is installing for me. If you watch this after a month, EVO might have updated it to install a newer version instead. So, check the updates on Discord. Now, we have Nunchaku installed. Now, you can start Comfy UI from here normally, or you can start with Sage Attention enabled. On some computers, Quen might not work with Sage Attention enabled, and it will generate black images. If that happens, you can start normally using the other file. But for me, it works fine with Sage Attention. If you don't have another version of Comfy UI installed, you can skip the next step I am doing in the next minute. But I do have an older version of Comfy UI where all my downloaded models are, and I want to make a shortcut to them so I don't duplicate the models in the new Comfy UI. So, copy this extra model path maker file and navigate to where you have your old Comfy UI with the models. Go until you find the models folder and inside that folder, paste this path maker file. Now run this file and you can see what it created. Paths for where all the models are and in which folders. You can close this text file. Now copy this newly created extra models path file. Then go to your new Comfy UI folder and find the folder named Comfy UI. You should see a similar file there with the extension example. Paste your file there and we should be good to go. Let's go and start our comfy UI using this BAT file. As you can see, it added those extra search paths and you can also see it has the new Python version 3.12 along with the version of PyTorch it is using. It is running with Sage Attention enabled. By the way, if you didn't install Insight Face like me, you might see a warning when Comfy UI starts saying no module Insight Face. You can safely ignore that or install that add-on if you think you will use it. I don't install it because it is for personal use and my channel is monetized. Anyway, let's move forward. And as you can see, Comfy UI starts normally without a problem. Each time I install a new Comfy UI, I do some settings adjustments, but this is optional. What I do is go to settings and then search for K tool. I turn off the little dyno that floats over my windows using this button. Then for node alignment, which appears here all the time, I make it show only when I select at least two nodes. So I enable this function. Next, I go to RG3 settings. I enable auto nest so it shows groups in organized subdirectories and I reduce the number of items needed for nesting. You can also enable show fast toggles in group headers. Then click save. These are my settings. So now if I select multiple groups, I can see the alignment tool that lets me align the nodes how I want, but it disappears when no nodes are selected. This is quite useful for creating more organized and betterl looking workflows. We have all we need now. Let's test some workflows. You can drag workflows into the Comfy UI window or open them from the file menu. Let's open this simple one first. All the workflows are available for free on Discord. Check the video description. I included in this Pixar Roma note everything you need to get started. I already showed you how to install Comfy UI and Nunchaku. Now, we need a Nunchaku model for the new Quen. There are two types of models depending on the video card you have. If you have a 40 series or lower, go with INT 4 models. And if you have a 50 series, go with FP4 models. So, for my RTX 490 video card, I am using this model, but you should download the one that fits your card. I put here a list with all available models. On top are the FP4 models for 50 series, and for all others that don't have 50 series, there are the INT4 models. In these file names, R32 means the model was quantized with a low rank factor of 32, which makes it lighter, faster, and more VRAMm friendly, but with lower quality and more approximation errors. R 128 means a higher rank factor of 128, which keeps much more detail and gives higher image quality, but at the cost of slower speed and higher memory usage. So, if you have under 12 GB of VRAM, I recommend going with R32. And if you have more than 12 GB of VRAM, go with R 128 like I do. Then you have the full version and the version with fewer steps with four steps and with eight steps. What I use today in my tutorial is this R 128 version with four steps. So you choose your version, download it, and navigate to your Comfy UI folder. Look for the models folder and inside you should see the diffusion models folder. That is where you put the model. As you can see, I included the folder where you need to place it here. After it is downloaded, you can refresh Comfy UI by pressing R so it can see the model. Then you load it in this node. Do the same for the clip model by placing it in the clip folder. And the same for the VAE model. Once you have them all downloaded and loaded, you can test the workflow. If you used QuinnEdit before, it is quite similar. What is different are these text encoder nodes. If we double click on the canvas and search for a text encoder for Quen, we now have two. The edit one we used before and the edit plus that now has three image inputs instead of one. So, we don't use the first one. We use the second one with plus in the name. That is the node I used here. All I changed was the color to have green and red for positive and negative. In this workflow, I don't use any negative text because I set the CFG scale to one. So, it is faster and ignores any text you add there. For steps, I used only four because I chose the model with four steps. If you choose one with eight, change this value to eight. Or if you use the full model, change it to 30 steps or more. But I prefer to have quality and speed. So, for me, that model with four steps is good enough. Let's test it by uploading an image. I will use this cartoon bunny. The scale image to total pixel node prevents you from uploading images that are too big since the model cannot work well if the images are oversized, but you can bypass it if you want to try different sizes. After it is scaled to a reasonable size, it goes to the text encoder. You need to connect it to both positive and negative even though the workflow will run if only the positive is connected. Then we have the prompt in the positive prompt field. This is instructive, not descriptive, which means you put instructions there about what you want to change. Then you run it. The first time is slower, but the second time should be faster. And here we got this. I added a compare node so we can see before and after, and it did a really good job changing the background while keeping it consistent. It is better than the previous Quen edit, and I think it is better than Nano Banana from Google in many projects. Let's change winter to summer to see how it handles it. And I got this one. Now, let's try something more different, like the desert. This time, it replaced the background, but didn't replace the foreground with flowers. So, it needs more prompt adjustment to make it work. Always try multiple seeds to see if a new seed fixes it. In this case, it kept the foreground. So, let's add in the prompt to remove the vegetation. It removed more of it, but still left some blurred flowers. Then I added dry plants in the foreground and now it worked just great. Many times you can figure it out in a few steps and other times you have to explain it step by step like to a robot. It can handle long prompts so you can keep adding changes. But sometimes it might miss some things if there are too many instructions. It is better to add two three instructions and then use the rest of the prompt to describe how it should look after those instructions. Anyway, this is the best free model I have tested so far. So, it is really worth trying. Let's move to something more realistic to see if it can handle that. Well, I am using this portrait of a woman. And for the prompt, I will instruct her to hold a paper with the text love pixaroma on it. And this is the result. It added the paper and kept the face quite consistent, I would say. Let's test some emotions and make her look sad. Again, it seems to work just fine with that and looks quite similar. Let's make her angry and yelling. It did that too. But she still looks quite cute even when angry. Let's see if it knows left and right. When you refer to left, it means the left side of the image, not the character's point of view. And yes, it can do the right side also. So, it is quite good for creating a data set for Laura training, even if you have only a single image, although you might not even need a character Laura if it can already do anything you want with one image. We can also generate a back view of the same woman. Now let's move to the next workflow with multiple input images. The workflow is the same in the first part with the image and nodes. But what is different is that I added another two images. So we have a total of three inputs. Here is how I did it. I created a group that you can enable or disable. So you can disable both the second and third image and only run with one like in the previous workflow. Or you can enable the second one to run with two images or enable the third to run with all three inputs. Let's upload a man photo. Then the woman photo for the second image. And for the third image, let's add a dog. For the prompt, I will add the man and the woman are a couple taking a selfie and the woman is holding the dog. And we got this result with the man and woman taking a selfie and the dog included. So all three images are combined into one. I can disable the last image and leave only the man and woman. Let's create a wedding photo of them together. I just gave both images to chat GPT to generate a prompt and we got this image. Now, if the characters are farther in the distance and faces are smaller, some details get lost and it might not work as well as with close-ups. It seems a bit more blurry and less detailed. So, that's something to keep in mind. Let's replace the man with a bag. I have this Pixaroma bag I created with the same model and my logo. Now I instruct it to generate an image of the woman holding the bag in a fashion presentation and professional setting. And look at this result. It is quite good at combining multiple images. I forgot to mention that compared with the first workflow, this one doesn't use the first image as the size reference for generation. This one uses an empty latent with a resolution calculator node. So I can easily choose any ratio I want. So instead of the first image going to VAE encode and then to latent input, we have the empty latent going to latent input. This way I can select any ratio. Let's try this one. And here is the result in the ratio we asked it outpainted to make it fit. Let's do a classic portrait ratio. I will choose a car as the first image. And for the prompt, I will instruct the woman is in front of the car with more details about the lighting to look more professional. But you can also keep it simple. and I got this image. You can create some interesting generations by combining different images and prompting in creative ways. Now, let's make her wear a dress I chose. I have this dress with flowers, and I will prompt for the woman to wear that dress on a neutral background. And here is the result. Try different seeds to get different results. For speed, it takes about 12 to 17 seconds on my computer, depending on the size you use for the image. You can delete the resolution calculator node and use any size you want, but after a certain size, the consistency of the character might look different. Let me show you another workflow with an extra node for pre-processors. In this case, I will show you how to use a pose as a reference. I have this portrait of a man. Then for the second image, I will use another person's photo to borrow that pose. Let's say I take this image of a woman. Here we have the pre-processor node which I just named pose but it is the pre-processor node and I selected DW pre-processor. You can disable the third image but in this case I want to add a custom background too. So I will choose this beach background for the prompt. I tried something simple. The man looks surprised on the beach. So it should take the man the pose from the second image and the background from the third image. And this is the result. How crazy is that? We placed our man in the pose we wanted and in the environment we wanted. Depending on what you generate, you can try different pre-processors like depth or canny. If you don't want to use the pose and instead want to use the actual image, set it to none. So it will take the actual image if it is mentioned in the prompt. Since I didn't mention the woman in the prompt, I only got the man looking surprised. And because I didn't use the pre-processor to get the pose, it was just a normal pose. We can add the woman if we want in the prompt. And now she will appear in the final image. Or we can take it even further and create interactions like them hugging or something similar. It is quite fun to play with. Let me show you a few more examples. Let's say I like this floral pattern on a dress and I have this armchair that I want to have the same floral pattern. With those two images, I can add a prompt like replace the armchair texture pattern with the floral pattern from the dress. And when I run it, I get something like this. There might be cases where it doesn't always work, but as you can see, in this case, it worked just fine. Here is another example with this bunny. I wanted to replace the character with a turtle. And this way I get a consistent style very easily because it tends to replace something in the same style in the image which is quite useful. I have here the Mona Lisa painting and I wanted to replace the woman with a blonde medieval queen in a red outfit. It kept the face consistent when I didn't ask for changes to the face. Another example is where I added a witch hat and a black cat. The hat and cat are not perfectly integrated but close enough. Here I have my logo symbol and I wanted to make it look like a gold embroidery on a pillow and the result is quite nice. For this woman, I wanted a different hairstyle with bangs and shorter hair. Or I could change the clothing to warrior armor if I wanted and it integrated that quite well too. If you have sketches, you can render them in different art styles. For 3D renderings, it is quite good, as you can see in the result I got here. For illustrations, it also works well. For example, I can replace a castle with a volcano, and I got a similar style volcano. I can even rightclick on an image, copy it, and then paste it into the load image node with Ctrl +V, or just load it from the output folder. Then I can replace the palm tree with a tower if I want. And again, it fits great in the same art style. Here on the t-shirt, I had the word Juan, and I wanted it replaced with Pixa, and it fit perfectly, better than any mockup I could do in Photoshop. If you have old photos, you can also try to restore them, remove scratches, and colorize them like I did here. The Quen model is quite powerful. For example, it can remove items and clothes, so use it responsibly since it is not as censored as other models. It is also quite useful for animations to create different emotions and you can rotate the character and tell all kinds of stories. For photo manipulations, it is also useful. For example, I wanted to create a monster burger with teeth from an existing burger and it integrated the teeth quite well. Some images need more prompting. For example, I wanted the scene to be inside an airplane instead of a car, but it still kept the chair and some items, although it replaced the window with one from an airplane. So, it tries to keep things consistent, maybe too consistent sometimes. Here, I had a little frog on a robot's head, and I wanted it replaced with a black cat with orange eyes. The Nunaku team says it should work even with 8 GB if you play with the settings. just tweak num blocks on GPU and use pin memory to see what settings work best for your system. But if you cannot make it work or if you still want to use it on a laptop with a low video card or no video card, you can try running it in the cloud. I added the workflow on RunningHub so you can try it there. I will add a link in the description. You just click on the button launch in the cloud and it is all set. No need to download models, no need to install Comfy UI. Everything is ready to use like you see here. All you need to do is upload the image you want to edit. Let's say I add a dog. Wait a few seconds for the upload to finish. Then you can enable the second image or third one if you want depending on what you are trying to do. After that, you add the prompt with instructions about what to change. For example, the dog is wearing a pirate outfit. Then you adjust the ratio to match your uploaded image if you want to keep it similar. But you can also try different ratios if you want. And that is all. You press run depending on what subscription you have. I got a plus subscription so I can have better speed. On the right you can see it started to generate. The first time you run it is usually slower and in general it is slower than on a local PC because it is in the cloud and the video card is shared with other people. The loading model usually takes the most time here, but once it passes that node, it goes pretty fast. It took 41 seconds for the first run and I got this cute dog dressed up as a pirate. From here, you can download the image. If I want, I can copy the image and then select the load image node and paste it there. Wait a few seconds after pasting for it to upload. Now, let's change the prompt. So, we replace the background with the deck of a pirate ship and the dog is the captain of that pirate ship. Let's run it. This time, it took 26 seconds to run. So, sometimes it is faster, sometimes slower, like with any online service. I usually use it more on weekends because I don't have access to my PC with the good video card. Let's try with another image. Maybe with this woman. For the prompt, I will put that the woman is wearing a vintage outfit. Maybe something steampunk. Now, because she is further away, it is not so accurate. Or at least for this seed, it wasn't perfect. On portraits and close-ups, it gives much better results because there are more pixels to work with. Let's enable the second image and add the man for the prompt. Let's put them both together inside a car. And I got exactly what I asked for. Now, I use it more than Nano Banana because it is quite good, it is free, it has no watermarks, and it is less censored. That is all for today. Thank you, Legends, and thanks to everyone who subscribed to the membership and supports this channel. If you found something useful, leave a like and a comment to help with the YouTube algorithm. Check Discord for workflows and updates on Nunchaku. If you have problems running it, make sure you post a screenshot on Discord with your system details, what video card you are using, and with which workflows so it is easier for members to help you. Thank you for watching and have a great day. [Music]

ComfyUI Tutorial Series Ep 64 Nunchaku Qwen Image Edit 2509

Channel: pixaroma

Convert Another Video

Share transcript:

Want to generate another YouTube transcript?

Enter a YouTube URL below to generate a new transcript.