Transcript of Image2video Wan 2.2 5b for ComfyUI
Video Transcript:
So comparing these results from 15B right one 2.2 two we have the sixstep one we have a 10st step 15 step 20 step and 30 step generation so if we look at the first one here the sixstep generation well it isn't amazing now we are not using fast one light x2v or anything like that so this is native at uh 720 well actually 704 by 1280 and you really got to go up to about 10 steps to see some improvement here. Now, it's not amazing. I would say at about 15 steps, it's really passable and pretty good. And at 20 and 30, we have a good result. The thing here is now that we have a fairly close-up image, the portrait here of this woman's face. We can use lower steps. If you're using something different like this woman walking here, we can clearly see that we have to increase the number of steps, right? Especially if you look at the sides here, the the grass or the whatever is coming up, the growing nature foliage here, it really starts breaking down if we have a lower step count. With that said, we are using a swap technique. So, we're using a higher CFG for the first part of the generation and a lower CFG on the second part of the generation. I'll show you that once we go over the workflow. Hello, you beautiful people. Today, we're checking out one 2.2 with a 5B model. So, the 5B model is the most technically proficient model. Right now, it uses the new one 2.2 VAE, whereas the 12.214B model still uses the 2.1 VAE. So, while that can technically produce better quality because it's 14B, the 5B is technically the leap forward, I guess. Oh, and what do you call a magic dog? A Labracadabrador. Now, this is the workflow that we're going to be using today. You can download that by using the link in the description that will get you to this guide, which is also a text guide of how to get everything installed. And you just scroll down to the bottom here. Here you have downloads. So there's an arrow here, attachments. Just download this one here, right? This one's free. You can download. You don't need to be a paying member. If you do want to support my Patreon, thank you very much. That's how I support this channel. Now, let's get back to the actual workflow. So, let me guide you through what's going on here. Right. So, we have one 2.2. And whenever everything is installed, it's just going to be super simple. You're loading an image. You're setting your frames. If you want to change that, it's up to you. You're writing a prompt. If you want to set the folder and a file name, I've coded this for you here. You can also change the frame rate for your generation if you prefer that. We're running this at 24 frames right now. Now, what you need to do is you need to install the models here. Well, actually, first you have to install the custom node. So, if you don't have all the beautiful installed nodes here and you have a couple of red ones, well, you actually have to press manager. You have to press install missing custom nodes and there's going to have be a bunch here. You select them, press install, restart, and you're good to go. Then it's going to look something like this. With me so far? Great. Now, for the models, there is a note here. I'm going to put this in the description as well. So, for the one model, this one that you can find in the model manager. What does that mean? Well, press the manager, you press model manager, you search for one 22, and here you have a bunch. You're looking for the I uh there we go. TI, so text or image to vid 5B. We're using that 5B one. You can also just search 5B because it's the only 5B model available right now. Install that. The text encoder you have to download from this link, place it in your model/extenccoders folder. So that's inside of your comfy UI folder and then models then text encoder. Slap that right in there. The VAE goes in your models/VAE folder. You get that from this link here. Now if you are having errors with Triton, Python, Xformer, Sage attention, whatever. Hey, just take the log. So what is what is the log? So if you take what's what goes in here when you get the error and just copy paste all that, send it into chat GPT and tell it hey I'm having issues with this and it will actually tell you how to do stuff. So it'll give you like a code and then you go into your command prompt and and let it's going to say something like hey pip install triton or whatever right and then it'll solve that for you. Super amazing. Then you should be good to go. If you want to use uh fast one or any optional Laura you have to install that separately. There are notes for that here. Okay. Now everything should be installed for you. So what you got to do is you got to drop an image in right. So you drop an image in here. It is automatically going to be resized based on these sizes. But it's not going to be 1280 x 1280 unless it's well it's a square. So it's going to keep the proportion of your image. So the best size for 12.25b is 704x 1280. Right? So if this image is 704 x 1280, it's going to look at the highest resolution 1280. Okay, then we're not changing the image. Keep proportion. Okay, then it's going to stay 704x 1280. Your image is always going to be resized to a multiple of 32. If it's not, I think you just can set that to 16 to I guess that's beside the point right now. Um, if you aren't doing that, you're going to get errors, right? So, if you try to load this without resizing your image and without having an image that is a multiple of of 16 or 32, you're going to get errors. That's why we're doing this automatically. So, you can upload any image and it's going to be fixed for you, right? We're setting the number of frames. If you want to do less, I would do like, you know, 51 or something. 121 is going to take a lot of time. You're writing your prompt. Woman turns her head and looks to the side, showing her face inside view. This is your positive prompt. The negative prompt goes down here. This is what you don't want to see, right? Uh, I put this in. So, if you want a folder, you put that here. This is required. You have to have a folder. Uh, you also have have to have a file name. So now we will go into the folder one 2.2. So first it goes into outputs then 12 22 and then my one 5B video and then it actually slaps on the pixel size to your file name as well. So that's kind of handy. At least I think it is. And then you're going going to get your beautiful image or not image a video actually. So here we have our woman looking to the side. I say it's it's looking pretty good. And this is a 15st step generation, but I'm going to show you a little bit how how that works. Notice that the kind of the sun shining through the hat here is still looking fairly fairly realistic. You know, that that's moving about now. Okay. So, what's going on here? Right. Well, wait, Seb, you have two samplers? Yes, I sure do. Now, I'm not using any of the like the clown shark samplers and some of that more advanced stuff that people have been using for 14B and I think that's for a later video and this is a more simplified workflow for 5B. The reason we are using two samplers is we are generating this video in two steps or actually multiple steps but in in two separate generations. So we're having half of the generation run with a higher CFG and then we're having the last part of the generation with a lower CFG. CFG of one. Why? Okay. So a CFG of one will increase the speed by about two times. So that's a massive speed increase. But when you do that, you're kind of decreasing the quality and the prompt adherence and and you're killing the negative prompts. you you can't see the negative prompt or actually you don't get the negative prompt. So if we're setting cfg1 for the full duration, well then our output is going to suffer and it's going to look well not bad, but you're not going to it's not going to understand your prompt as well. So what we're doing is running this for 3.5 cfg first. So then we're the model understands our prompt and then we're swapping and getting that speed increase. Confusing? You got it. And that actually works. So here we're setting the total steps. I find that 15 works great if you have closeup videos, stuff like this. If you have remember the the woman walking from before, if you have stuff like that, I would probably increase it to 20 to to 30 steps. Right? Then we're having some simple math here. So this total step value is divided by two or multiplied by 0.5. So we're getting half of this value and we're then sending that to the end step and the start step. So that's uh going to be probably going to be seven or eight. So the end step here is going to be let's say it's seven, right? So it runs this sampler from 0 to 7 steps and then the second sampler from 8 to 15 steps to finish the generation. Pretty cool, no? And then we're having a seed. So we're having a randomized seed every time, but we are using the same seed for uh both of the samplers. Is that required? I'm not sure. I'm not sure. We are using the Oileruler. Feel free to te multiple ones. Uh some schedulers work great when you're only using one sampler, but when you're chaining two samplers like these does not work. For example, I think Flow Match Pussa here, for example, is working amazing as a default sampler 41 2.2 5B. But when you're chaining like this, uh, it does not work. So, we're using oiler here, right? And then we're just sending that on and it's finished. Now, there are some more options here. You don't have to worry about this. This is just the file name handling. So, we're getting the width and the height and we're converting that into the file name so that your file name gets beautiful pixel size into the name. Let's look at some of the optionals, right? Block swap. What is that? Well, if you want it, you click here or here. Can't see this. Well, that's because in RG3 up here you have settings and you want to have show fast toggles and group headers and you want to have that on bypass and you want to have that on always. That will give you this button and then you can easily see see how easy that is. You can also use a fast groups bypasser like this. If you enable block swap, you will be able to run one 2.2 with lower VRAM. What does that mean? Well, let's say that if I ran this using 14 GB of VRAM, if I'm enabling block swap, it's going to take me 10 gigabytes of VRAM. So that's I don't know 30ish% less, right? It will however increase generation time. I repeat this will increase generation time. It will not reduce quality though. So if you are limited by out of memory errors, try block swap. If you want to use a Laura, just enable this. Uh I think I have a fast one Laura as default, but I mean you can use any Lura. Doesn't matter. You can just Google a one 2.2 Laura and and slap it in here. If you want the fast one, I think there's uh let me see here is the link to that. So fast one 2.2 TI2V5B, right? And I think we've covered most of what's going on here. There's some fancy set and get nodes that you don't have to worry about. What you need to worry about is loading an image, setting the number of frames you want, writing your prompt, change the the the folder or the file name if you want, and well, you generate. Click up here, run or controll enter. That's how I do it. Control enter. So easy. And the only thing you kind of need to worry about down here is total steps. So if you're changing this a lot, well, you could like move that up here and be like, "Oh, this is important. I'm testing this and changing it all the time." Okay, makes sense. And if you want to do iterations of your of your regenerations and you're testing stuff, well, maybe you set this seed to fixed and then you can change some of the settings and see what happens and compare the results. That way you will get some beautiful one 2.2 generations with 5B. Hope you learned something today. As always, have a good one. See you.
Image2video Wan 2.2 5b for ComfyUI
Channel: Sebastian Kamph
Share transcript:
Want to generate another YouTube transcript?
Enter a YouTube URL below to generate a new transcript.