YouTube to Text Converter

Transcript of WAN 2.1 Vace for High-Quality AI Video Creation GGUF + Causvid + ControlNet

Video Transcript:

if you ever struggle to create AI videos because of VRAM limitations or complex settings this tutorial will shift everything I'll walk you through how I use the one two vase model and GGUF quantized models to generate stunning video outputs even on lowerend GPUs You'll learn how to build a simplified comfy UI video workflow that uses Cosvid Laura for faster rendering and open pose conditioning for full body motion consistency allowing you to turn any image into a cinematic quality video To get started first let's visit the page here by Quansac There are quite a few quantiz models here which are all 14 billion models So check the sizes then download according to your storage or VRAM preference I'm going to download the 14B Q5 KS model here Then save it into Comfy UI under models unit folder I have mine already downloaded here Next visit the page here to download the text encoder model Since we are using GGUF the scaled version worked fine for me but you might find other versions So save it into comfy UI under models text encoders folder After that we also need the VAE model So download the model here from hagging pace Then save it under models VAE folder as you can see here Following the models downloaded to speed up the video generations we need to download the cosid Laura as well So download this model save it into comfy UI under models the Laura folder I have created a folder here for one two and save it in here Once these four models are fully downloaded and let's open comfy UI inside comfy go to the manager custom nodes manager Make sure you already have these custom nodes installed which is the video helper suite comfy gguf custom node also the comfy control net auxiliary prep-processor Once you have this installed we go back to the canvas Let's first start with the control net workflow Right click to add a node down to video helper suite Select load video upload I'll zoom in on this I'll click here to load my video of this surf model Let's change a few settings I'll use the same frame rate of the video which is 25 Change the width to 5 to8 I'll use a height of 960 for a vertical aspect ratio I'll skip 12 frames from here after the beginning of the video So this starts in the middle Then let's change the format to one Next we need to extract the depth map from the video I'll use the depth pre-processor Select depth anything v2 And why do we suggest depth anything v2 because after comparing the results to other depth pre-processors it gives accurate distance clear details and better extraction results So connect the video node to the pre-processor node Change the resolution to 640 Now let's see a preview using the video combiner Link the pre-processor in here Update the settings I'll match the frame rate to 25 For the file name I'll create a folder path Format will be H.264 Let's zoom out from here We can all do this right then I'll hit run All right nicely extracted We have our depth map from the video I'll drag this below the pre-processor node Next let's import the reference image by using the load image node I'll highlight all of these nodes Then let's place them into a group I'm going to connect the reference image a bit later for better understanding and clarity For now let's create the GGUF workflow with one vase Let's right click to add a node bootleg Select unit loader GGUF I have a dedicated video about GGUF So if you do not have this installed please watch that after this Select the GGUF model we downloaded earlier whichever one you downloaded Next we need a load clip node Search load clip Change the clip model to T5XL I'm using the scaled version Then change the type to one Right click to add another node Let's go to loaders and select load VAE I'll drag this below Then select the one two VAE model which we also downloaded Now let's connect all of these So I'll drag out the unit loader Go to search type Laura loader Select Laura loader model only You should find the cause Laura from one two which we saved from my testing and results earlier Reduce the strength to 0.5 Let's drag out the clip output Move down Select clip text and code Make a copy of this down Link the clip into the duplicate as well This will be the positive as green Then change this to red as the negative prompt I'll go ahead to connect the VAE using the anything everywhere node This is all easy to understand right panning to the right we need the most important node of the workflow Right click to add node Go to conditioning video models For the models here we have covered these in the previous videos So for this video select one base to video Then connect the positive prompt The negative goes in here as well to the negative VAE will be automatically connected Next we need a control video So I'm going to zoom out from here Then let's drag our control video from the depth map Then link it into the control video input down Next we see the reference image So let's zoom out once again Now we can connect the reference image earlier So drag this from the output into the reference input of the vase node From the one vase node we need the K sampler Join the positive to the positive negative to the negative The latent as well goes into the latent Let's drag out the Laura Go to search then type model sampling Choose model sampling SD3 Then change the shift to eight for better results Now we can link this into the K sampler Once again double click then type trim video This is a native node available from comy And from here we convert the trim amount into an input Let's connect these as well K sampler latent to the samples Drag the trim latent from the vase node into the trim amount Let's drag out the latent from here down to select VAE decode We are getting there Drag out VAE decode Then search for video combine And this is from the video helpers feed I'll zoom out from here Then let's organize all these nodes into separate groups to keep things tidy Feels simple right so before we test this out let's put in the right settings for the various nodes I'll select the positive prompt to start with Let's zoom into the node I already have my prompt description so I'll paste it in here For the negative prompt I'll paste the default prompt from one Then let's pan to the vase node Let's change the width to 528 Then the height to 960 Make sure to use the same aspect ratio from the video I'll leave this at 81 frames The maximum frames using one two Moving on Since we are using the cosid Laura the K sampler details here are really serious to avoid low quality videos So I'll use a random seed of 96 Let's change this to fixed Lower the steps to seven Then the cfg should be one The sampler name use DDIM Then foruler use DDIM uniform Video combine node Moving on Make sure the frame rates here are the same 25 So your video generations are in sync I'll save the file name as something different The format changes to H.264 I'll zoom out from here Ready to test this Let's go ahead and run this All right So this is nice and easy This seems straightforward The reference image is following exactly to the depth control and the quality from vase and cosvid is quite excellent So far very easy right now Here is one challenge I didn't expect when I was starting out Looking at my previous video results here the output did not precisely match my reference image So how do we solve that using the same workflow let's move up to the control net group I'll change the reference image to demonstrate this better So I'll pull out from here and let's just run this to see the result using depth map control All right this is completed and I'll move this up to compare the video and the reference image In the reference image she wears a jacket as we can see but this is not represented from what we see in the final generation Rather she's influenced wearing a bikini from the original video So let's fix this I'll move up here on the canvas right click to add node control net pre-processor Let's choose DW estimator This will give more freedom to follow the reference image at the same time guide the video Connect the video to the DW pre-processor Change the detect face settings Disable this Next double click Then search image blend Then let's connect both pre-processors so we can adjust the weight and their influence I'll use 0.95 for a higher open pose influence I'll make a copy of the video node to see what this looks like But before I run this I'll zoom out of the canvas Then let's bypass these groups just to run the control net workflow Once this is good let's hit run to preview the open pose prep-processor only All right so this appears acceptable Now let's use the open pose to influence the video instead of the depth map So I'll drag the image blend out all the way down into the control video input of the face node Once this is connected I'll bypass all the groups once again to make them active Then remember to change the prompt description and motion of any reference image Let's slide to the video combine node I want to make a copy of the video node so we can compare both video results Let's zoom out from here Easy to follow so far I'll hit run to see the open pose results All right So comparing both videos we can now notice the final consistency of the video and a huge difference this can produce We can now see more of a jacket and the outfit compared to just using the DE pre-processor So I'm going to delete this from here We no longer need it No pre-processor is either good or better Simply try each control net with different images Move down to change your prompt description Hit prompt to figure out which processor gives you the most consistency to the reference image you might be using Similar to any other pre-processor as well If all of these results attract you in any way all of these materials are available for you to make the most of in the creators resources So I hope you all feel inspired to brainstorm some exciting ideas with random images And as always don't forget to leave a like It really means a lot Joining the membership as a creator of Premiere is another way you can support the channel Thank you to every member everyone so far for your support If you enjoy this video you can view more of my other videos using one here And I'll see you all in the next one [Music] Are you

WAN 2.1 Vace for High-Quality AI Video Creation GGUF + Causvid + ControlNet

Channel: goshnii AI

Convert Another Video

Share transcript:

Want to generate another YouTube transcript?

Enter a YouTube URL below to generate a new transcript.