YouTube to Text Converter

Transcript of Wan alpha:This AI Creates Transparent Videos with PERFECT Glass & Smoke Effects.

Video Transcript:

Hello everyone! Today I'm going to introduce a very useful video model Why do I say it's so useful? Because this model can generate high-quality video footage with transparency Let's take a look at the results Using this model I can very easily generate a video And this video comes with an Alpha channel which is what we refer to as the transparency channel Now, pay close attention because this channel is very special It's not like the channels you get from cutout tools like the Segment Anything extension which are typically just solid white or solid black This channel actually contains a lot of fine detail as you can probably see We can use this channel to place the characters from the video onto any background achieving what's known as a transparent footage effect As you can see the blending result is fantastic The foreground and background merge together perfectly But of course that's not even the main point If it were just about this our Segment Anything tool could also achieve it The key thing to note is that this model can preserve the transparent properties of objects So, what does that mean? Look closely at this video I've generated here: it's a beautiful woman drinking juice The glass she's holding is of course transparent In the generated footage itself you can't see anything special at first glance But if you look at its Alpha channel you can see that the glass has transparency properties So when we take this video and place it onto our background you'll find that the background is perfectly visible through the transparent glass This is what makes this model truly amazing This model is called Wan Alpha It's actually a fine-tuned version of the Wan 2.1 model It retains the ability of Wan 2.1 to generate very high-quality video while adding the capability to produce transparent videos If you're interested you can visit their official website to see more examples And the foreground isn't just limited to the main subject In some cases parts of the background can also be included For instance, in this example besides the bird the tree branch is also fully rendered as part of the foreground This information is completely captured in our Alpha channel What gets included depends entirely on how we write our prompts This brings me to an important point: in all your prompts you must include the word "transparent". Whether you're specifying the video should be transparent or the background should be transparent you have to state it clearly in the prompt The creators have also been very thoughtful and provided a version for ComfyUI You can open the Wan Alpha GitHub page and scroll down to find a dedicated section with instructions for using it in ComfyUI This includes the download links for the models The base model is Wan 2.1 as we've already mentioned The fine-tuning is implemented as a LoRA So you'll need to download this LoRA file Please note the name of the LoRA might not be finalized yet The model also supports an acceleration model so you can use light-x2v Then you'll need to download the encoder and the VAE Note that there are two VAEs: one for RGB and one for the Alpha channel which is a key difference Once you've downloaded everything just place the models into their corresponding folders according to the directory structure provided You can download the example ComfyUI workflow from the link on their page To make this demonstration easier I've also set up the workflow on Running Hub You can access this platform through runninghub ai Running Hub is an online ComfyUI workspace I use frequently because they are always the first to integrate new extensions and technologies as soon as they are released You can use the invitation link in my video description to register for Running Hub and receive 1 000 free credits You also get 100 bonus credits for logging in daily which is great for trying out your own workflows The workflow you see here is one I've slightly restructured to make it easier to see the final results I have a basic configuration section here where I've set the resolution (width and height) and the number of frames I'm using 33 frames here but you can go up to 81 My resolution is 832 by 480 Next is the model loading section First we load the main Wan 2.1 text-to-video model Then we load the acceleration model which is light-x2v After that we load the corresponding LoRA which is the key component for generating transparent videos Then comes the text encoder And as I emphasized earlier there are two VAEs: one RGB VAE and one Alpha channel VAE So you need to load both here The rest of the setup is basically identical to a standard Wan 2.1 text-to-video workflow Now, let's write a prompt I'll keep it simple: "a little girl is blowing bubbles". And remember the crucial part: specify that the "background is transparent". This must be clearly written After that the process goes through the sampler I recommend using 8 sampling steps for best results but if you're concerned about speed 4 steps will work Because we are using the acceleration LoRA we set the CFG to 1 Next is the decoding process which is a bit more complex here First we need to decode the RGB content This gives you a sequence of images To make it clearer I'm not displaying it as a video here The second step is decoding the Alpha channel which will give us a sequence of image masks The main difference between these two decoding paths is that they use different VAEs The developers also created a custom node to display the transparent content However I'm not sure if it's an issue with the Running Hub implementation or the node itself but right now it only shows a single frame instead of a video So to show you the result more clearly I've extended the workflow a bit I added a background video here—a park scene Then using the "Image Composite Masked" node which we've used many times before we can merge the two image sequences together using our mask An important detail: the output from the Alpha decoding is an image so you first need to convert it into a mask before feeding it into the composite node After doing that we can blend our generated character with the background And notice the bubbles she blows are also transparent! This really shows that the results from this model are fantastic and quite different from traditional cutout methods Of course you can test different prompts and backgrounds For example let's change the background and the prompt to "a beautiful woman is dancing " again making sure to specify the video is transparent We get another great blended result However, after a few tests I did notice a slight decrease in the aesthetic quality of the human figures This issue doesn't seem to affect objects though Let's look at this example The background video is an oasis on a snowy mountaintop shot from a bird's-eye view My prompt is to generate a flying eagle First let's look at the generated RGB video Notice that at the beginning there's some background pixel information You can see this in the mask as well where the background is initially white but then gradually fades away leaving the eagle as the main subject As a result when we composite it you'll see clouds and mist at the start and then the eagle slowly becomes the focus of the scene The effect is stunning So what can you do if the quality of the people isn't perfect? As you might have guessed since this model supports Wan 2.1 LoRAs seamlessly you can simply add a character beautification LoRA For instance in this next example I've added a LoRA that I previously trained specifically for the Wan 2.1 model I don't recommend setting the weight to 1; I found 0.75 works well My prompt is simply "she is dancing " and the background video is a simple scene of a mountain stream Let's check out the result To make it easier to see I've combined everything into videos This is the original RGB video The model tries to keep the background a solid color but as you can see around the hair there's still some pixel information there But the mask is incredibly detailed You can see it contains depth information for the glasses eyes and hair—it's not just a flat white mask When composited the result is a very natural blend I generated some smoke because smoke is transparent I also added a glass which is also transparent We can see that the smoke is present in the foreground information And in the mask besides the character details the transparent quality of both the glass and the smoke is perfectly preserved Let's look at the final blended result The white smoke doesn't completely block the background; you can see the transparency is maintained This effect is even more pronounced on the glass Since the eyes also have a lot of transparent detail you'll notice subtle changes in them that correspond to the ambient light You can even see bits of the background through the gaps in her hair So, through these examples you can see that with this technology we can create incredibly impressive transparent video assets That's all for today What are you waiting for? Go ahead and try it out for yourself! Follow me to become an AI expert

Wan alpha:This AI Creates Transparent Videos with PERFECT Glass & Smoke Effects.

Channel: Veteran AI

Convert Another Video

Share transcript:

Want to generate another YouTube transcript?

Enter a YouTube URL below to generate a new transcript.