YouTube to Text Converter

Transcript of Wan2.2 Animate: Generate long videos with Character Replacement (GGUF)

Video Transcript:

The workflow and set up can be complicated, so I  have downloaded and installed Comfy UI portable.  I have installed the Comfy UI manager.  And just to make sure, I have also updated   the program using this update batch file. Now I will run Comfy UI and get the workflow   from “Wan video repository”. This is the workflow.  Copy the workflow and paste it inside Comfy UI. Almost all the nodes are missing, so I  will close this and install one by one,   with the help of Comfy UI’s node  manager, from the missing node list.  All the highlights for the missing  node are gone after a restart.  This is my models folder containing models   I downloaded before. I will  keep it in the new Comfy UI. These are links to the model,  and these are the FP8 models to   be kept in the diffusion models folder. I read a comment stating this should be   used with 4000 series or newer cards. And the below ‘E5’ file should be used   with 3 thousand series and older cards. Also, these models are around 18-gigabyte,   which can be large for some users. So, there are smaller GGUF files,   which should be kept inside the U-Net folder. The GGUF files are being added here right now.  I will add a link to these  files in the Comfy UI notes.  The model file will be  selected here, and right now,   I have the FP8 and the Q4 model. Let’s try the GGUF model first. Below is the VAE, and this FP32 looks large.  You can also have the older Wan 2.1 VAE as well,   in case the FP32 fails. I will add a direct download   link so you don’t have to look for it. The next file is the Clip vision file.  And I have added the link  to this file in the notes.  There are two Loras used in this workflow,  and the links to both Loras are already given.  If you have a bigger graphics card,   then try disabling this block swap node. Or  decrease the value of ”block to swap number” The more you block swap, the  slower the generation will be. Next is the text encoder; the  link to the file is already given.  But this file is too large, so you  can go one step back, and scroll down,   at the bottom, you should find  the FP8 file, which is smaller. I will add the link to this file in the workflow. And I will be using the smaller   file in this video. The rest can be kept as it is.  Here, I need to upload the image  of the character I want in motion.  And this is where the video frames are  analyzed for swapping the character.  I will upload a video here, The video has 109 frames,   and the pointers here will mark some coordinates  on each frame to locate the character for swap.  Here, the program will show us  the marking as a mask, which   will be in focus during the swapping process. Let’s have a look; I will bypass the nodes below   so the video generation does not start. This node will download a file if not   present for this process. I will queue the workflow.  This segmentation file was not downloaded in my  case because it was present in the models folder.  The black section has been identified;  this is where the swapping process happens.  These green circles are the area to mask, and  the red ones are for the area not to mask. These are the values   sent further to the segmentation node. By holding the shift key and the left,   right click, you can add more pointers here. Clear the above pointers with this button.  Once you are done adjusting, press-run  to check if the mask is correct. Below is the face identified from the given video;   the face will be used to match  the lip movement and expression. Now I can activate these nodes  and try to run the workflow.  I forgot to give an image to swap.  So, the image is uploaded, and now the  workflow should give us some error. The first error is about the “Sage  attention”, which is missing. Let’s say   I do not want to use “Sage attention” I will select SDPA and try again. The 109 frames of the video are  divided into 2 windows to process.  The division is based on this 77,  And I got the second error,  which says cannot find Triton.  So, this 77 should be the number of frames  in one window, and you can increase this.  Keeping more than 80 frames  will impact the quality.  Getting back to the error, here the program  was about to process a batch of 77 frames.  And then the Triton is missing error. I will get Triton from this repository. Scroll down, and look at the sixth step. This redistributable file needs   to be installed. It is simple, just  double-click, and it should be installed. Now you need to run a few commands from here.  Copy this uninstall command, and  go to the ‘Python embedded’ folder.  Open CMD, type this, and paste the command. This  removes any older version of Triton installed. Now, the install command. This smaller-than  sign means install a version lower than 3.5  The table above says that if you have  PyTorch version 2.8, then install Triton 3.4. Back to the command prompt, press up arrow to have  the previous command, and modify it, type list.  This is my Torch version, and  similarly, you can check yours. So, I will copy and press the up-arrow key, and  modify, then paste the command, and press enter.  Now, Triton is installed. The installation command is also given below.  I will save the workflow and restart Comfy UI. I have queued the saved workflow,   and a window of 77 frames is in progress. One step took 61 seconds, which is slow.  This is because the resolution  here is around 800 pixels.  My graphics card has 16 gigabytes  of memory, but still, that's a 4060. I cancelled the generation, and now it won't  stop, so I have to force-terminate the process.  I will start again, with a lower resolution.  This is just to maintain the aspect ratio. Processing time for one step is around 18 seconds.   We can increase the resolution later or  maybe upscale the smaller resolution. The second window is processing 77 frames  again instead of just the remaining frames.  The result will have a section where there will be   no sound and weird motion because the  video I uploaded has only 109 frames. To avoid such cases, you can divide  the 109 frames into two batches. The generation took 260 seconds, and remember, we  need sage attention to speed up the generation. The character was supposed to be  this one, from the above image.  The stairs and entrance are  from the yellow image above.  A different section got swapped. This mask area got shifted to the right side,   which means the swap feature is working. These green and yellow pointers are shifted.  This happened when I changed the resolution;  the video box on the right became smaller.  Now, you will not be making this mistake. To make the mask precise, you can change this   “block mask” value. And now the boxes are smaller. I will keep the value 32 and try to keep things   as default as I can. So, I ran the workflow   and it worked, but with a few extra frames. To fix this, I can make a few changes above.  Now, I can divide the total  frames into two sections or make   one batch of 80 frames by removing some frames. Wan video is good with 80 frames; So I will let it   be in one window and remove these 28 frames. I will skip the first 28 frames and   will keep the remaining in a single window, I have to change this value to 81, so it does not   create two windows, and then queue the workflow. And this is the result, which looks good.  The lighting on the character did not change,  but it does change when using a larger model.  The generation time was 150 seconds, and  for some reason, the window has 85 frames.  With these settings, I can process longer  generations with multiple windows of 85 frames. Now let’s look at the sage attention. These are the “Sage attention” releases.  I need to download one of this files.  In the Python embedded folder, type this command,  and you should see the torch version installed.  And this is my python version. The CP 39 goes along with the Python   version. And the Torch version 2.8 matches  here. But the Cuda version is different.  I will download the latest version available  and save it inside the Python embedded folder.  Now I will type the install command. After the  word ‘install’, type a dot and a forward slash,   and type Sage and press tab. The file name should complete   automatically, and then press Enter. It says installation is successful.  I have missed something in the  Triton setup, I just remembered.  This zip file must be downloaded. This is for Python 3.13,   and the content from the file needs to be  kept inside the “Python Embedded” folder. I restarted Comfy UI, selected sage  attention, and queued the workflow.  2025-09-20 08-50-24 Processing one step took 14   seconds, but the generation time is similar.  Maybe the second generation will be faster.  The result looks fine, with a minor difference. The zip file I downloaded was not used,  and Sage attention somehow worked. However,   it was an instruction given. So, I will  follow the same and extract the file.  Copy the content from inside. And paste it in  the Python embedded folder. This can be deleted.  After a restart, I selected the  FP 8 file, which is larger than   my graphics memory. I am also using the BF16 VAE. The generation time was around 170 seconds, and it   should work for longer video generation as well. A few more GGUF modes are visible now. Q4 KS   got uploaded 30 minutes ago. If you want a single video in   the result, just make a connection with  the image directly, and it should work.  I have added the link to  Triton and Sage attention.  And the workflow is accessible in  the browser section; these are the   installed node packages. The workflow  is available in the ‘Wan video wrapper’.  This is the same I used in the video. That is it in this video, hope  you have an excellent day.

Wan2.2 Animate: Generate long videos with Character Replacement (GGUF)

Channel: AI Ninja

Convert Another Video

Share transcript:

Want to generate another YouTube transcript?

Enter a YouTube URL below to generate a new transcript.