Transcript of QWEN Image Edit - Extra Super Powers Edition ComfyUI Workflow!
Video Transcript:
I've powered up last week's Quenedit workflow with even more ease at your fingertips as it's simply the best free and open-source model that I've played with so far. What do you get now? Well, there's multiple masking options, automatic or manual. A little speed up thanks to Key. An easy way to switch between image or empty latent. And to make things a bit more like Quen chat, I've also added an AI prompting enhancement which allows you to chat with the thing instead of well writing the prompts yourself. Vision models are actually aware of the image you're passing in. So all sorts of fun can be had there. You can upgrade for free at home just by taking notes from this video and copying everything I've done here for yourself at home. Pause, take screenshots, do whatever you need to do. Or if you find this stuff helpful and maybe you'd like all the work done for you already, then you can support the channel via Patreon and get all these modifications ready to go. Your support lets me make even more workflows for you and to share these videos with everyone. The choice is yours. And a huge thank you to all supporters because you make this possible. Oh, I've enabled channel memberships now as well, which gives you priority comments and some very British emojis. Jolly good. Diving into this workflow then, and those familiar with the channel will know that I like to use the rodent method. This is because I think neatly crafted workflows are easier to both use and follow. Plus, you have less of that distracting spaghetti everywhere. First up are some improvements in the loading group. I've made a little note there so you can see why I've made these particular changes. CFG norm doesn't make a whole lot of difference when you're using the version two fourstep lightning lura or indeed the eightstep one, but if you're happy with things taking a little longer, then it can make a difference. Torch compile speeds things up a little after your first generation and is best on 40 series cards or newer. So, if you do have one of those, the option is there. No worries if not of course as you can simply bypass or remove the node altogether. There we are. We highlight it. We press B and that's bypassed the same with CFG norm. What is it that happens with that CFG norm thing anyway? Well, we can take a look here. I am flying around a library in my costume and I've got a prompt over here. There it is. The positive prompt text. The books are made of Stilton and the shelves are made of ancient oak. As I'm not using the Lightning Lura here, I've got a CFG of 2.5 and 20 steps. Just like in this little note here about the K sampler settings. Here we have the result with CFG norm. Pretty decent, I think. And now again without CFG norm, also pretty decent, but certainly different. There it is again with. And once again without. Which do you prefer? Well, whichever version it is, you can now more easily disable or enable that node because, as you might have noticed, bypassing a node just before using set can cause some issues, which is why I moved it over there. If you are using the lightning lura, then well, the change isn't quite as pronounced, but then four steps is ever so much faster than 20. So here we have it with the uh lightning lura enabled and here it is with the lightning lura without cfg norm. So there it is without and once again with as you can see there's well practically no difference whatsoever. Now what other stuff do I have for you? I guess masking and all those latent options is a good thing to look at next. Taking a look at image one then. And you can see I'm passing the mask through and also setting the image mask there in that new node. This is the image I'm passing in. And I'm going to change her hair into some cheese. This is the prompt I'm using to do that. Her hair is made of cheese. Fairly simple and straightforward. As image one is where the masking happens. One way to do a manual mask is to rightclick over that image and then open it up in mask editor. You can then do the mask. There we go. So, eventually you'll go around and select all her hair. Once you have masked all the things you want to change, doesn't my masking look perfect there? You will need to go up here and pick the image mask option. Yes, I've added loads of latent options. So, we really should take a look at what's going on here. There are now four options for latent. Like you can see here, enable latent empty. Enable latent image mask, which is the one we're using here because we've done our mask manually. Enable latent image. That just passes the image in same as last week. Or enable latent remove background. Basically, that's an automatic option which uses robust video matting to do an automatic mask for you. though if you have a preferred node there for removing the background that is easy enough to replace those four latent options are controlled by the groups I'm going to show you now the image option is the same one as last week and just sends the whole image in the empty latent option is handy as it means you can pick a totally different width and height the image mask option is the one we're using now and that applies your perfectly ly drawn mask via the set latent noise node. And finally, the remove background option as mentioned using robust video matting. This is much like the image mask option, but as you can see, the mask is done automatically for you. If you fancy, you could also switch it around. There we go. So rather than using the inverted mask, if we use the mask, that will keep the background and remove the foreground. How do we know which is the correct latent to send into the K sampler? Well, we use the power of the any switch. The options are chosen in order. So, if you accidentally enable all of the options, then the first one there, the image latent, is the one which will be chosen by default. Marvelous stuff. But how does masking compare to an empty or image latent? Well, let's take a look to start with. Then here we have the result from the masked latent option. So that's the original image. And as I go across there, her hair gets turned into cheese. Delicious. However, if we now take a look at the image latent option and we go across, you'll see it does much the same thing, but oh, there's been a a slight shift in reality and everything isn't quite in the same place as it was before. So, that's one reason you may prefer to use a mask. For the automatic background removal option, make sure you've selected only that one, like you can see here. And remember, it doesn't matter if you've still got a mask on your image, as it won't be using that one. For the prompt, this time I'm using something slightly different because I'm changing the background and not her hair. The background is a massive piece of space stilton recursively flowing through cheese time to form cheesometric patterns. And the result looks like this. Pretty good day. Now, one thing you may notice is the background removal isn't quite perfect. And if we have a look at this bit of hair down here, there's a tiny bit of the old background still there. Don't worry though as you can tweak that a bit by changing the mask this time. Then I've changed the expand value to minus 64 meaning the mask gets shrunk a little. When we take a look at this result that really obvious bit from her hair on the left has gone and the cheese is even more recursive. Of course, every image will be different and some background removal nodes are better than others, but you can expand or shrink the mask as you see fit. Next up is the AI prompting. For this, you enable the AI prompt option at the top and then well, whatever else you want. In this case, I'm going to use image one and an empty latent. For the prompt, I have changed the style to the some type of painting. As you can see, I can be completely vague here and the AI will do its magic for me. How does it do that? Well, we've got the AI prompting group up here. This uses Olama locally and as you can see in the Olama connectivity node. I'm using Gemma 34B. It's a pretty decent vision model and it's nice and small. You will need Olama installed already, of course, and you can pick whatever Olama supported model you fancy. The best thing about this model is it's 4B, so it's nice and small, meaning it's also very fast when it comes to generating the prompt. And you don't have to strip out that thinking data. If you set the keep alive to zero, it doesn't take up any precious VRAM either, which is very handy. If you're a nerd like me, there's plenty of options there in the Olama options node, too. So, tweak away. In the Olama generate node, I've got a pretty basic system prompt, and you may wish to replace that with one of your own, but of course, British English is always the best thing to use. The preview any node shows the prompt it's generated in this case, and Gemma can obviously see what is going on in that image, which is two rats. It can even read the text on the sign as well. So, we know the vision capability is doing its job. This way, your prompting can instead become instructions and you get a whole new level of image editing fun. Much like the way we pick the latent, we can also use one of those any switches to choose between the AI prompt and the base prompt which you type in. Here we are down in the updated prompt section. And then of course the new node here is just get prompt which is connected up to the text entry area. The result in this case is unsurprisingly the image in a painting style. And each time you run it, you'll get a different style. Super handy for the lazy among us who can't be bothered changing the prompt each time. Well, of course, you can just get an AI to do it for you. And with all this AI stuff coming thick and fast, don't forget to like and subscribe for even more Nerdy Rodent Geekery. Nerdy Rodent, he really makes my day. Showing us AI in a really British way.
QWEN Image Edit - Extra Super Powers Edition ComfyUI Workflow!
Channel: Nerdy Rodent
Share transcript:
Want to generate another YouTube transcript?
Enter a YouTube URL below to generate a new transcript.