Transcript of Wan2.2 Animate: Generate long videos with Character Replacement (GGUF)
Video Transcript:
The workflow and set up can be complicated, so I have downloaded and installed Comfy UI portable. I have installed the Comfy UI manager. And just to make sure, I have also updated the program using this update batch file. Now I will run Comfy UI and get the workflow from “Wan video repository”. This is the workflow. Copy the workflow and paste it inside Comfy UI. Almost all the nodes are missing, so I will close this and install one by one, with the help of Comfy UI’s node manager, from the missing node list. All the highlights for the missing node are gone after a restart. This is my models folder containing models I downloaded before. I will keep it in the new Comfy UI. These are links to the model, and these are the FP8 models to be kept in the diffusion models folder. I read a comment stating this should be used with 4000 series or newer cards. And the below ‘E5’ file should be used with 3 thousand series and older cards. Also, these models are around 18-gigabyte, which can be large for some users. So, there are smaller GGUF files, which should be kept inside the U-Net folder. The GGUF files are being added here right now. I will add a link to these files in the Comfy UI notes. The model file will be selected here, and right now, I have the FP8 and the Q4 model. Let’s try the GGUF model first. Below is the VAE, and this FP32 looks large. You can also have the older Wan 2.1 VAE as well, in case the FP32 fails. I will add a direct download link so you don’t have to look for it. The next file is the Clip vision file. And I have added the link to this file in the notes. There are two Loras used in this workflow, and the links to both Loras are already given. If you have a bigger graphics card, then try disabling this block swap node. Or decrease the value of ”block to swap number” The more you block swap, the slower the generation will be. Next is the text encoder; the link to the file is already given. But this file is too large, so you can go one step back, and scroll down, at the bottom, you should find the FP8 file, which is smaller. I will add the link to this file in the workflow. And I will be using the smaller file in this video. The rest can be kept as it is. Here, I need to upload the image of the character I want in motion. And this is where the video frames are analyzed for swapping the character. I will upload a video here, The video has 109 frames, and the pointers here will mark some coordinates on each frame to locate the character for swap. Here, the program will show us the marking as a mask, which will be in focus during the swapping process. Let’s have a look; I will bypass the nodes below so the video generation does not start. This node will download a file if not present for this process. I will queue the workflow. This segmentation file was not downloaded in my case because it was present in the models folder. The black section has been identified; this is where the swapping process happens. These green circles are the area to mask, and the red ones are for the area not to mask. These are the values sent further to the segmentation node. By holding the shift key and the left, right click, you can add more pointers here. Clear the above pointers with this button. Once you are done adjusting, press-run to check if the mask is correct. Below is the face identified from the given video; the face will be used to match the lip movement and expression. Now I can activate these nodes and try to run the workflow. I forgot to give an image to swap. So, the image is uploaded, and now the workflow should give us some error. The first error is about the “Sage attention”, which is missing. Let’s say I do not want to use “Sage attention” I will select SDPA and try again. The 109 frames of the video are divided into 2 windows to process. The division is based on this 77, And I got the second error, which says cannot find Triton. So, this 77 should be the number of frames in one window, and you can increase this. Keeping more than 80 frames will impact the quality. Getting back to the error, here the program was about to process a batch of 77 frames. And then the Triton is missing error. I will get Triton from this repository. Scroll down, and look at the sixth step. This redistributable file needs to be installed. It is simple, just double-click, and it should be installed. Now you need to run a few commands from here. Copy this uninstall command, and go to the ‘Python embedded’ folder. Open CMD, type this, and paste the command. This removes any older version of Triton installed. Now, the install command. This smaller-than sign means install a version lower than 3.5 The table above says that if you have PyTorch version 2.8, then install Triton 3.4. Back to the command prompt, press up arrow to have the previous command, and modify it, type list. This is my Torch version, and similarly, you can check yours. So, I will copy and press the up-arrow key, and modify, then paste the command, and press enter. Now, Triton is installed. The installation command is also given below. I will save the workflow and restart Comfy UI. I have queued the saved workflow, and a window of 77 frames is in progress. One step took 61 seconds, which is slow. This is because the resolution here is around 800 pixels. My graphics card has 16 gigabytes of memory, but still, that's a 4060. I cancelled the generation, and now it won't stop, so I have to force-terminate the process. I will start again, with a lower resolution. This is just to maintain the aspect ratio. Processing time for one step is around 18 seconds. We can increase the resolution later or maybe upscale the smaller resolution. The second window is processing 77 frames again instead of just the remaining frames. The result will have a section where there will be no sound and weird motion because the video I uploaded has only 109 frames. To avoid such cases, you can divide the 109 frames into two batches. The generation took 260 seconds, and remember, we need sage attention to speed up the generation. The character was supposed to be this one, from the above image. The stairs and entrance are from the yellow image above. A different section got swapped. This mask area got shifted to the right side, which means the swap feature is working. These green and yellow pointers are shifted. This happened when I changed the resolution; the video box on the right became smaller. Now, you will not be making this mistake. To make the mask precise, you can change this “block mask” value. And now the boxes are smaller. I will keep the value 32 and try to keep things as default as I can. So, I ran the workflow and it worked, but with a few extra frames. To fix this, I can make a few changes above. Now, I can divide the total frames into two sections or make one batch of 80 frames by removing some frames. Wan video is good with 80 frames; So I will let it be in one window and remove these 28 frames. I will skip the first 28 frames and will keep the remaining in a single window, I have to change this value to 81, so it does not create two windows, and then queue the workflow. And this is the result, which looks good. The lighting on the character did not change, but it does change when using a larger model. The generation time was 150 seconds, and for some reason, the window has 85 frames. With these settings, I can process longer generations with multiple windows of 85 frames. Now let’s look at the sage attention. These are the “Sage attention” releases. I need to download one of this files. In the Python embedded folder, type this command, and you should see the torch version installed. And this is my python version. The CP 39 goes along with the Python version. And the Torch version 2.8 matches here. But the Cuda version is different. I will download the latest version available and save it inside the Python embedded folder. Now I will type the install command. After the word ‘install’, type a dot and a forward slash, and type Sage and press tab. The file name should complete automatically, and then press Enter. It says installation is successful. I have missed something in the Triton setup, I just remembered. This zip file must be downloaded. This is for Python 3.13, and the content from the file needs to be kept inside the “Python Embedded” folder. I restarted Comfy UI, selected sage attention, and queued the workflow. 2025-09-20 08-50-24 Processing one step took 14 seconds, but the generation time is similar. Maybe the second generation will be faster. The result looks fine, with a minor difference. The zip file I downloaded was not used, and Sage attention somehow worked. However, it was an instruction given. So, I will follow the same and extract the file. Copy the content from inside. And paste it in the Python embedded folder. This can be deleted. After a restart, I selected the FP 8 file, which is larger than my graphics memory. I am also using the BF16 VAE. The generation time was around 170 seconds, and it should work for longer video generation as well. A few more GGUF modes are visible now. Q4 KS got uploaded 30 minutes ago. If you want a single video in the result, just make a connection with the image directly, and it should work. I have added the link to Triton and Sage attention. And the workflow is accessible in the browser section; these are the installed node packages. The workflow is available in the ‘Wan video wrapper’. This is the same I used in the video. That is it in this video, hope you have an excellent day.
Wan2.2 Animate: Generate long videos with Character Replacement (GGUF)
Channel: AI Ninja
Share transcript:
Want to generate another YouTube transcript?
Enter a YouTube URL below to generate a new transcript.