Transcript of Wan2.2 VACE Fun A14B Gives Better Character Consistency (ComfyUI)
Video Transcript:
These open-source AI video models keep coming thick and fast, and this week we get one fun vase. Why is that a good thing? Well, I've scribbled it down here in this note. Vase basically adds a control layer with improved character and style consistency over that plain old WAN fun. So, if you had character or style drift issues with WAN fun, then vase aims to fix those inconsistencies. So, it's much like WAN fun, but more betterer. Once again, we're going to be using nodes from the awesome KeyI. And to craft these workflows yourself at home for free, do check out all of the example workflows provided with that custom node pack, as that is where I started from, too. Alternatively, if you enjoy these workflows and videos, find them helpful, then you can support the channel via Patreon. that helps me create even more workflows for you and to share these videos with everyone. Of course, the choice is yours. And a massive thank you to all those who are able to contribute. As usual, I've neatly arranged things into their little rodent method boxes for ease. And if you need more info, then I've already done a number of WAN video guides, so take a look at those. Loading things with vase is slightly different to WAN fun in a couple of ways. First, we're going to be using the WAN 2.2 text to video model here. Meaning, if you've already got that model, then it will save you a download. The other change you will have noticed is these new vase control modules. They changed the standard text to video model into Wanfun vase, which is what we're testing here. You can pick your own Loras, but here I've gone for the fourstep lightning. This way you can generate videos in under one and a half minutes with a super fast card. But there are plenty of options such as torch compile and block swap which we've seen already in previous videos. So let's dive straight into the prompt. Once again I'm using the same prompt for all of the examples you're going to see. And this time I'm asking Wan to generate a woman wearing a red dress for a castle courtyard background which should have a green and red bird a top a fountain. All nice and straightforward thus far. But what about the results? Well, I have plenty of those. Here we can see text to video, video to video with matting, reference images, first frame, last frame, and reference only. So, let's dive into those results. Nothing really changes in the text to video. The first one here, we've got all the standard stuff. Just zoom in on this to get a better view of the courtyard. And as you can see, the results with plain old text to video are indeed very impressive. A nice fourstep thing there. It's got the woman. She's flicking her hair. We've got the green and red bird. The courtyard. Yep, that seems to have followed the prompt very nicely. Option two then, video to video. Now, last week we used the WAN fun node. And of course this week we've got the WAN video vase encode instead. This has input frames, reference images, and you can even use masks as well. For this particular configuration, I'm using a shift of nine. Got four steps in total and two steps on each of those samplers. I've got an optional CFG schedule there which I'm not using, but we'll have more on that in just a moment. The results here then are pretty decent. Yes, we've got the woman. She's wearing the red dress. There's a courtyard. There's the fountain. And there's the green and red bird. Although we haven't really got much water coming out of that fountain. Now, of course, you'll notice the woman does look a little bit different to the input video. And I've got plenty of things to go through here. So, are you ready for this rather long selection of tests I've got? This is the first one. Let's take a look at a load more. Now, remember I mentioned that CFG schedule? Well, this time I've got it attached. So, we've got one step at 2.5 CFG and the rest of them go through at CFG1. This is quite good because if we scroll over here and we can see there now the dress is a lot more red. The bird has also got little bit of water coming out of it. So, I think that's a bit of an improvement. However, because I'm a nerd, of course, I did absolutely loads of testing. I've tested with different shift values, different steps, whether or not I've got the lura set high or low. And uh well, I'll show you all the various results. Now, first up, then I'm doing a shift of eight, and I've got four steps, but I'm only doing one step on the first one, then the remaining steps on the other sampler. And in this case, it's okay, but I think the red on that dress is a bit too bright and strange. And we've got this lamp post in the background. So the slightly lower shift value of eight hasn't changed that input video as much, but the colors I think are a bit off there. Certainly on her dress. The same shift value again this time of eight, but I've increased that start step by two. So I'm doing two steps on there and two steps on the second one. And the result in this case is it's much better. So, we've got less of that glare on the red dress. And also, the lamp post thing there is ever so slightly less visible. And we've got the fountain coming out. So, this is actually my favorite one out of all of them, but we'll see some more results. Going up one step. So, three on the first one there. And the result in this case is fairly similar. Slight change on the light. Decent colors on the dress. So, not a bad result. Still in four steps. How about a few more steps this time? So, up to six steps in total. And once again, one step to start with on the first and the remaining on the second sampler. And once again, we've got that horrible color. It's far too bright on the dress. And the lamp post is a bit weird. It's sort of in the background there. And the bird is okay, but we haven't got any water coming out the fountain. So, I think one step there is not quite enough. Incrementing that step value once again. So, up to two steps on the first one there. And as you can see, it's once again different. We've completely lost the lamp post there, but we've also lost the green on the bird and the water coming out of the fountain as well. So, more steps. Not necessarily better in this case. Okay. How about three and six then? Result here is ah All right. So, we've got some green back on the bird. I'll do a little zoom out there again. Uh the lamp has gone, which is quite nice. So, it's okay. But once again, still no water, although we've got a bit at the top there. And then finally up to four steps this time. And is it any better? Sort of. Sort of. Although we've got all these bits and pieces on her arm now. So, there was like a little tiny bit and that's sort of spread everywhere. So, she's got some strange looking tattoos. The bird is there, which is green and red. And we've even got a little bit of the lamp post in the background. Water seems to be coming down a bit more. So yeah, not necessarily better. That's my basic rationale then for going with four steps in total, two steps on each. As you saw, attaching that CFG there. So you get 2.5 CFG on the first step of the first sampler did help with the redness of the dress a little bit. Now, of course, one thing with vase is it can maintain those characters a little bit more. So, if you've got a particular face that you want to keep, vase is one way to do it. A slightly different video this time. And what I'm doing in this case, I'm using robust video matting. So, that's cutting out the background and then also sending in a reference image. So, we got the video in here that's going through the robust video matting. I'm also taking one of those images. It happens to be frame number 80 because that's where she's pulling up her coat. That goes in as the reference image. For the sampler settings, I haven't used the CFG schedule here. So, just the normal settings there, two and four steps. And I've gone for a shift value of nine as I found shift slightly higher than eight does tend to help. The result in this case is pretty decent. So, as you can see, the likeness of the person is well, it's practically the same. There's a slight change there on the top. So, it was a white strap there. Now, it's blue, but the face looks the same to me. The earrings the same, of course, because we removed the background. That's been able to easily change it into that castle courtyard. You don't have to use video matting, of course. You can just throw the video straight in like we're doing here. Now, one slight change I've made here is I'm taking an image from this one here, and that's index zero. So, basically, the first frame of that video, and that one I'm resizing and sending in as the reference image in this case. The sampler settings in this case are much the same. Once again, shift 9, two, and four, no cfg. And if we have a look at the results then, ooh, okay, so we've definitely got the same person in there, but this is all fading and weird. So, obviously that first frame going in because the reference is black and we're not using the video matting, it's having a little bit of a hard time changing that background. What can we do about that? Well, as you might have noticed, the WAN video encode node has a few little settings we can play with. So this time I actually lowered the strength down to 0.75. For the sampler settings once again I've got the shift of 9 two and four, but I have attached the CFG this time. Sometimes it can help follow your prompt a little bit better if you do have that first CFG slightly higher than one. The result here is sort of okay. Now we've definitely got much the same person. Yes, I can recognize her. And it has changed the background much better this time. We've got the red and green bird and the fountains working as well. We've got the lamp post there, which is always a bit strange, but there is, you can see it just a little tiny flicker at the beginning. So, it's not quite perfect. All right, then. How about if we use a different image as the reference image? So, this time I've got the castle with the red and green bird all set. So, we haven't got the character in there and we're using this one as the reference image instead. So, there you can see it's resized and going into the reference image. Sampler settings still the same. Shift 9, two, and four. CFG3 on the first step. And what do we get this time? Oh, okay. So, it's a very similar person, but as you can see, the likeness has changed. The background is definitely the same. That one's quite good. And the first frame doesn't quite flicker as badly. We've got a little bit of the lamp post in there, which is a bit strange. So, it's okay, but it could be a bit better, couldn't it? How about we change things up a bit this time? I've got a reference image, which I'm also using as the start image for this WAN video vase start to end frame node. As you can see, you can attach an end image if you want as well. and the control images I've got connected because I'm changing that input video into a depth video instead using the depth anything v2 relative node from comfy UI control net auxiliary. This image I'm also using as a reference image. So it's going in both as the start image and the reference image as well. Much the same settings on the sampler. And now we get something fairly similar to that previous video matting option. So, the character looks much the same as in the original video. We got the nice red on the dress. The fountain has a tiny bit of animation as well. And oh, the background's actually moving slightly as well. So, it definitely isn't quite as static. Not a bad result, although I think so far I still prefer the video matting. You don't have to use a video either. You could just use a reference image. So, here's one we've got. We're surrounding it with white so it will take that character's face. That one's just going in to the reference image. We haven't got any input frames in this case. Although this time for the sampler, I've gone with three and six because two and four didn't work quite as well with a reference image. If it hasn't got that control video, certainly the extra steps do seem to help in this case. I also have another little reminding note down here. If you are just using a reference image only, then that CFG schedule is also best left unconnected. Let's have a look at the result. And there she is. That's pretty good, isn't it? So, we've got just a reference image. She's got the the face to start with. Ooh, and we've got the fountain in the background with a red and green bird. Very nice indeed. So, there you go. One fun vase helps you keep your characters a little bit more accurate. Nerdy rodent. He really makes my day. showing us AI in a really British way. [Music]
Wan2.2 VACE Fun A14B Gives Better Character Consistency (ComfyUI)
Channel: Nerdy Rodent
Share transcript:
Want to generate another YouTube transcript?
Enter a YouTube URL below to generate a new transcript.