YouTube to Text Converter

Transcript of Wan 2.2 S2V Speech to Video 5 Steps #ComfyUI Native Workflow + Step by Step #s2v #wan22 #lipsync#ai

Video Transcript:

I feel the rhythm in my shoes. Every heartbeat moves with the groove. The lights are shining. Colors collide. The music takes near. All those moments will be lost in time like tears in rain. Yet in their loss they give life its worth. Hello friends, welcome back to rare tutor. Today we are testing 1 S2V audioddriven cinematic video generation based on van 2.2 model. In the official page, it is mentioned Van S2V is an AI video generation model that can transform static images and audios into highquality videos. This model excels in film and television application scenarios capable of presenting realistic visual effects including generating natural facial expressions, body movements, and professional camera work. It supports both full body and halfbody character generation and can highquality complete various professional level content creation needs such as dialogue, singing and performance. Workflow used in this tutorial requires following nodes. Links will be provided in the video description. Let's start the tutorial. By taking the workflow officially available in comfy UI blog, I have created this workflow which supports both safe tensors and gguf models in single workflow. In the models loader of this workflow, I have used distort 2.0 multiGPU nodes. So all low VRAM users also can generate the video without any problem. As many users of this node has mentioned on online that they were able to load large model with low VRAM without any problem. You can check the Distorch 2.0 official GitHub page for more details on how to use it efficiently. Step one, here you can select GGUF or safe and source model. It's really easy to switch between the models. I have selected 1 S2VQ5 GGUF model. For all the links for models and workflow, please check in the video description. Select the compute device as CUDA or any other device you are using. If you have a low VRAM, you can put required virtual RAM in gigabytes in this option. For low VRAM user, select donor device as CPU and high precision LORA is true. If you choose the safe 10 source model, all settings same as above. Here select the KIP model. Here also you can choose UMT5XL encoder, GGUF model or safe and source model. And here select the VAE model. Here also you can choose VAE GGUF model or safe tensors model. Now step two, if you are using Sage Attention, select auto option or if you don't have Sage Attention, disable it. Here, select true to enable FP16 accumulation option. If you don't want it, set it to false. Here, select one lightning image to video high noise Laura and strength is 0.80. Here, select one lightning image to video low noise Lora and strength is 0.80. Here select light X to V rank 32 one image to video Laura and strength is 050. Shift is 10. Here I have used nag attention node for better negative prompt usage. If you don't want it, you can disable it. Here load the audio file. [Music] You can use this node to trim the audio. Load the audio and instrument splitter model here. Here, select the wave to veto large model. Step three here load the reference image. Width of the image is 960 and the height of the image is 528. This is the prompt I use for this video. Official workflow follows 16 frames and also as per my testing only 16 frames works best but you can do experiments with it. This is the main setup for long generation. Limited number of frames are generated in small windows. then it will combine into one. For example, a window of 65 frames will generate a 4 seconds video at 16 fps. Similar way 10 windows can produce a 40 seconds video without putting pressure on the GPU. Here I am creating a seven batches of 4 seconds that is 4 into 7 total of 28 seconds. If you want to increase the length either increasing the frames in the each window or you can increase the number of batches manually. I have selected 30 seconds in audio loader to be in the safer side. Either you can select 28 or more but not less than the total batches. This workflow can generate more than 30 seconds of video. Already seven video extending nodes are available in present workflow. If you want more, you can increase it manually. In this way, you can generate unlimited length videos. like this. You can increase the video generation length. So you have one main batch and eight extended batch. Total nine batches. That is just example. Okay, let's back to our 28 seconds video with seven batches processed. Here I am added five steps and added CFG value one. selected UNIPC as sampler and selected simple scheduleuler. Step four, five, and six are automatic. Step five is fully designed by the official workflow creator of the Comfy UI. So, full credit goes to him. This part here will remove unwanted and high noise starting frames in the video. Thanks to the original creator. Okay, let's hit the run button. I feel in my shoes every heartbeat. Here I select 1 S2V save tensoros file and select weight type FP85M2 as I'm using Nvidia RTX 3060 GPU. Rest all the settings as same as GGUF Here I select UMT5 XXL scale safe tensors clip model. Rest all the settings as same as GGUF. Rest all the settings as same as GGUF. Now go to audio loader to change the audio file. Now we are generating an 8-second video using only one main sampler and one extender node. Okay, let's hit the run button in rain. All those moments will be lost in time like tears in rain. Yet in their loss they give life its worth. All those moment to use this workflow you need to update Comfy UI to 0.3.56 or latest and also make sure your Comfy UI frontend package version is 1.25.11 or latest. Now see, I will downgrade the Comfy UI front- end version to an old version and workflow will not load. Now see, I will again upgrade the Comfy UI front- end version to the latest version and the workflow will load without error. If anybody gets this error while generating the video, please enable sage attention by adding this text in the web UI bat file of the comfy UI. I hope you find this tutorial helpful. Please comment your opinion in the comment section below. If you like my effort, please hit the like button and share it with your friends. Meet you in the next video.

Wan 2.2 S2V Speech to Video 5 Steps #ComfyUI Native Workflow + Step by Step #s2v #wan22 #lipsync#ai

Channel: RareTutor (AI Learning)

Convert Another Video

Share transcript:

Want to generate another YouTube transcript?

Enter a YouTube URL below to generate a new transcript.