YouTube to Text Converter

Transcript of Wan 2.2 Pusa V1.0 - Multi-Features In One LoRA Fine Tune Model.

Video Transcript:

Hello everyone. We are going to experiment a new model for AI video 1 2.2. This is the PUA version 1.0. Developed by Rafael Louu and a team. Pusav 1.0 builds on the cuttingedge 1 2.2 architecture. And here is what they mentioned. Pusa delivers cinematic quality videos with incredible efficiency. So what's the special of this model? They use something called vectorized timestep adaptation or VTA approach that gives frame by frame control over video generation. This means PUSA can handle a wide range of tasks in one unified framework. From crafting videos straight from text prompts to animating a single image, connecting start and end frames or even extending and completing existing clips with seamless transitions. Well, let's see. So, what is the uniqueness of this model? Busava V1.0 was trained on just 4,000 highquality video samples, thousands of times smaller than competitors. Yet, it outperforms them, scoring an impressive 87.32% on VBench i2V. Plus, it's cost effective with training cost to just $500 compared to over $100,000 for similar models. Also, in version 1.0, zero. It combine light X2V for generation speed acceleration. You get good results in as four inference steps making it perfect for creators on consumer grade hardware like an Nvidia RTX 490. So let's try out how to run the Pus version 1.0 and Comfy UI. First, we need to download the Laura model. And you've got two ways to do that. The first one is of course the official PUA hugging face repo where you can download the high noise and low-noise pus safe tensor files. Now that's going to be a 4.9 GBTE file. If you want something more trimmed down and convenient to load in Comfy UI, you can go to the Juan video comfy repo. That's the hugging face repository for the Juan video wrapper. There's a folder there now for Pusa. When you click into the Pusa folder, you'll see Juan 2.1. That's the previous one which launched just a few days before the Juan 2.2 base model release. I haven't really talked about this Loa model before because I didn't think it was worth mentioning since the new one 2.2 came out. So, the one 2.2 PUSA version one is right here. You've got the high- noise one resized dynamic 80 98 rank and BF16 as well as the low-noise model. Both are almost 1 GBTE in file size which makes them way more convenient for running smaller models in Comfy UI using the model loader. And right here I've just run these examples. So the left side that I highlighted is running with the Pusa version 1.0. The one on the left is just 1 2.2 linked up with the light X2 VRA model. So I want to show some examples of running this in the native node and later we'll try it in the one video wrapper too. The thing is, if you're running this Laura model by itself, you're going to get a pretty blurry result. And as you can see, I've got two groups set up here using Pusa only and the others using Light X2V Laura model along. When you do a sidebyside comparison, you can see it's kind of not quite there when it comes to high quality. This model isn't really the way to go if you're chasing better quality result, especially you load this Laura model alone. One thing worth to mention, the Laura strength are based on the recommended setting from Pusa hugging face repo. High noise 1.5, low noise 1.4 for setting. Now take a look at another example. Here's image to video. Same thing happens when you're just running this Laura model by itself. The light X2V on the other side gives you way better results than one 2.2 with the Puzza Laura. the fireflames and smoke with light X2V look way clearer. So, let's see what we can actually do with this model, what features it has, and what you can play around with. So, for better way to use this Laura mode, here's a fix. Try stacking the Loras. You can stack Pusa with the Light X2V, both high noise and low- noiseise versions across both sampling groups. Once you do that, your video generation goes back to normal. And when you compare it to the earlier clip that only used light x2v, the quality is basically the same. So for both image to video and text to video, there's no real difference if you're just running pusa the standard way. Now the unique thing about the pusa model, it's got these features, start frame, end frame, and video extension. Plus, it can work alongside light x2v to run video generation in just four steps. So, let's test out the video extension feature and see how it works with Pusa. For the Pusa Wan 2.2 Laura models, you can also run them in the Juan video wrapper. There's an updated example workflow in there you can check out. It uses the exact features mentioned in the Pusa hugging face repo like video extension. So, here's how the video extension works. It captures your input video and pulls the last frames from it plus the start frames. Then at the end it stitches everything back together into the final output. The end frame gets sent to the one video encoder which turns it into extra image latent data. And as you can see there's a new node called one video add pusa noise. That node lets you run the video using image embeds and adds multiple noise inputs from theuler then sends it all as one single image embed into the sampler. Now, on the sampler side, there's a videouler that looks a lot like the K sampler advanced. You can set start steps and end steps measured by total steps. You can also adjust the shift number which affects multiple samplers, too. Then it goes into the one video sampler where you've got both high noise and low noise sampling. At the end, everything gets stitched back together with the extended video from this part. VAE decode kicks in and it's merged back with the input frames. This is just a demo showcase, by the way. But here's the real deal, the actual video extension setup. You've got your input start frames from the beginning of your video. Then the extension part from the VAE decode gets stitched into the image batch and you end up with one full video including the extension. This example is just a few seconds long and at the end those 81 frames. That's all generated by the AI using the Pusalura model. Here's a clearer look at where the extension happens. As you can see, after the guy turns his head to the left, the rest, the jumping, the bumping motions, that's all generated by the Puzza Laura model. Now, here's the model loader. We're using the one 2.2 2 texttovideo model. This isn't an imagetovideo model, by the way. And over here in the Laura stack, we've got both the Pusa model and the Liteex 2V Laura models. I'm actually using the Liteex 2V model from Juan 2.1 here. It tends to perform a bit better for texttovideo and other tasks. The VAE loader comes with the Juan video text encoder. The one video text encoder has some options you can play with like using disk caching or choosing whether to run the encoder on GPU or CPU all in one single node for easy setup. Plus, you can input your text prompts right there. More convenient. So, I'm going to run this video now. This one was generated using PUA in the native node workflow. As you can see, it's only loading 81 frames, which means the video is really short. So, I'm going to extend it, make it a bit longer. To do that, you just input the total number of frames here. Kind of like the looping and video extensions I've talked about in past videos. You've got to set how many frames you want. For example, I'll put in 201 frames. That means I'm tripling the video length. Instead of just 81 frames, now it'll generate way more. After setting the total frames and input frame count, this is basically the overlap, like how many frames are used in the stitching part of the extension. I'll just leave it at default for this demo. I'll also change the text prompt to match what I want for this extended version. So now it'll be a video of a warrior riding a horse and he'll keep riding with a longer extension. Let's run it and see how it turns out. Here's what you get after generating. First, in the videouler, you'll see the sigma graph. When you set video sigma to steps, which is set to 0.875 by default, it maps to this curve. At the end steps, 0.875 marks the point where it cuts off the high noise phase and switches to low noise for the rest of the video generation. This gives you a clearer picture of how the model sampling progresses and how efficient your step count is for processing and generating the video. You want to keep this curve smooth like this instead of starting at one and dropping straight down. That kind of drop means your video won't process well. So keep it like this. From what I've seen, the sweet spot for Sigma in one video 2.2 is somewhere between 0.8 and 0.9. So sticking around 0.8. Eight is a good general rule. Now, looking at the video extension result, we've got the last 24 frames. Then, it continues into the 201 frames I just generated, adding that extra duration. And here you can see we're bringing in both the reference video and the generated one to check out the end of the clip. Right here after the extension, you can see the warrior keeps riding the horse and raises his sword, extending that horse riding scene smoothly. So yeah, this is how you can use video extensions with the one video rapper. And honestly, this method is way more convenient, less setup, easier to use. If you go the native node route, you'll need more customization, but if you're experienced, that gives you more flexibility. Anyway, this is how you run Puzza with video extensions using the last 24 frames as a reference and continuing the generation. There's a custom node here that adds extra latent data into the one video wrapper plus extra noise for Pusa to process. Compared to what I did in previous videos, just using a single frame with image to video and extending length, having overlap frame is way better. It's actually similar to how we used to do it with one 2.1 vase where we added extra frames as used overlap frames to generate long videos. Same concept for video frames handling. So here after about 5 seconds it add the extra 201 frames to make it 16 seconds. Pretty cool. As for Pusa, yeah, there's a little improvement in what it offers feature-wise compare with their Juan 2.1 Pusa. But in terms of video quality, it's still not the go-to video. Laura model, you can get the same or even better quality without using Pusa at all. So, personally, I'd say maybe the video extension features are useful. Start frame and end frame. You don't need the Pusalora for that. You can do it natively in Comfy UI with one 2.2 already. That's it for this video. Another way to handle video extensions by using Pusawan 2.2 Laura models. I'll see you guys in the next one. See you.

Wan 2.2 Pusa V1.0 - Multi-Features In One LoRA Fine Tune Model.

Channel: Lions Garage

Convert Another Video

Share transcript:

Want to generate another YouTube transcript?

Enter a YouTube URL below to generate a new transcript.