Transcript of Qwen Edit 2509 - The Prompt You Must Know + Open Pose Control with Nunchaku Workflow
Video Transcript:
The Qwen Image Edit 2509 performs great, let’s look at some of the results generated using the smaller FP8 model. My prompt was, “change the figure in the picture to a professional photo with a blue background wearing a black suit and tie, and take a close-up of the upper body facing the camera. The character should remain consistent without any changes”. Generation time for the first image on my 4060 TI 16-gigabyte took 175 seconds, and the result had maintained the facial identity of the given image. My prompt was, “Modify this image to make the person appear visibly heavier while maintaining a natural and realistic look”. To maintain the consistency, I specified “The image should preserve the original's integrity, lighting, color palette, and overall style. The girl in the result looks heavier, as expected. The loss of details is clearly visible in the image, but the base model is working great, adding details using an lora should not be a challenge. Here I tried to extract the person in the middle of the image and wanted to create a close-up, with a black background. The prompt I am frequently using is, “Ensure that the character's face and body proportions remain unchanged”. This was given by GPT which works all the time. The latent image provided to the k-sampler is from the same image I uploaded. But in this case an image with an aspect ratio of 1:1 may look better. So, a different size latent image can be given by connecting the “Empty latent image” node to the k-sampler. The generation time for the subsequent images was around 60 seconds, And the image looks better now. I tried changing the face to a younger version of the girl in the picture, but the result was unrealistic. I tried a few times, but it did not work. Maybe the model struggles working with face, so I tried making changes to the person’s image. I tried making the subject look bald, The result looks natural, and exactly how it could have been. Next I tried manipulating the background, my prompt was to remove all the police in the background while maintaining the subject. After a few tries I think this is the best result I obtained, I specified to keep the proportions and background unchanged. And instructed to redraw the details of the removed areas, and match the elements with the image The prompts used will be shared so it becomes easier for you to manage through these edits. I tried adding cyborg-like enhancements, neon-illuminated wearables with cyberpunk style to the same subject. These elements should be extended from the character's existing traits like her body type, age and gender. The change of the background was not asked, but it goes very well with the subject. The model understands and works well with clothes, my prompt was to extract the police uniform and place it on a white background, as a product display picture. The tie seems to be fixed, and the police badge is added, which was covered by the braid hair style of the character. Virtual try-on works as well with this model. Changing the background works, I tried adding the Eiffel Tower from Paris, without any street and buildings behind it. The character’s edges were identified flawlessly and the changes were made. In this case, I tried changing the color of the sofa, to a brown leather sofa with an evening ambiance. This model identified the sofa on the left, and even this; which is not even visible. Also, the model identified this chair and left it as it is. The image given was some random pots, and the prompt was about turning them into a product photography. The handle is visible on one of these pots, and placed perfectly in the result. My prompt here was, “put some nice flowers in it.” which is just a normal conversation-like sentence and it works. I tried multiple images with watermark and the prompt I am using here really worked on all the images. I will share the prompt, so it becomes easier for you. I tried getting some kind of design pattern. I am not into this profession, so, I do not know what this diagram is called. The correct person should get the idea if he comes across this image. I tried adding a rear wing to this vehicle and it was added correctly. Though I was not sure if it is called a wing or a spoiler. Generate a close up on the girl’s face. Ultra sharp details like a professional photographer. And ensure that the character's face remains unchanged. Maintain the elements, pose, proportions, and features. The face generated here looks great, if I zoom in on the original image, the character's face becomes pixelated. The details on the face are not present in the image, but it has been generated flawlessly. The facial identity might change in such cases if you do not use the character lora. But the results generated here closely resemble the character from the given image. Adjust the woman’s facial expression so she looks surprised Adjust the woman's hand and make her hold a rifle, the rifle blends with the painting. I tried changing prompts, generated multiple times but the result always had bad hands. The workflow I am using is simple, the model selected here is the FP8 model. And this is where you can download the model from; above is the larger 40-gigabyte model which must be even better than what you saw just now. The GGUF models were also released a few hours ago, file size ranging between 7-gigabyte and 21-gigabyte. These are the lightning lora; and I suggest using this image edit lora. I tried the below image lora, and that worked as well. A few hours ago, the nunchaku version of the Qwen Edit 2509 was released as well. These FP4 models are for the 5 thousand series card. This R 128 model gives a better quality image, in comparison to this model. The two models below should be used on systems having 4 thousand series or older cards. Place these models in the diffusion models folder. The files below are some of the common files we use all the time. Like this lightning lora, the clip file and VAE. The image node connects to the prompt node, This “text encode Qwen image edit plus” does more than processing the text. The rest is the same which I explained in the previous Qwen videos. If you do not find this node even after updating Comfy UI, then switch to the nightly version and restart. Link to the files are added here, but the time when I was making this workflow, The workflow for this model was released by Comfy UI in the template section. And these recommended files are the same I was using. This workflow has the option to upload three images, and the one I was testing had an option for two images, the third image connection is not used, the rest is similar. This note says about the steps and CFG for different models. With the original 40-gigabyte model, the number of steps recommended is 50, for the best results. And 20 steps for the FP8 model. If you use a lightning lora, based on the steps specified here, you can change the steps in the k-sampler. The note below is what I showed you, to adjust the image size using the empty latent image instead of the uploaded image. And this is about using the resize node, in case you upload a larger image the latent image for the k-sampler will be adjusted around 1024 pixels. Enable the image upload node and upload multiple images in the generation process. Let’s look at some of the examples. I am using my workflow for two images with the FP8 model and the 8 step lora. And my prompt was about placing the headphones over the head of the person in the image. It is not a bad generation, but it failed to generate the correct product. I wonder if the original model can manage in such a case. My prompt was, “the woman is sitting on the bench” Image one is on the left, and then the second image. But the workflow works even if I upload images in random order. I went back to the single image workflow and generated a close up of the woman sitting on the bench. And then in the multi image workflow, I uploaded an image of a handbag. My prompt was, the woman with a handbag, sitting on a bench, the handbag over her shoulder. Ensure that the character's face and body proportions remain unchanged. The handbag generated is accurate, still if i want to pick mistakes; then it will be the handle, the handle generated is long and in the original image the handle is stitched to the bag. This is the workflow from Comfy UI’s template, and I have uploaded three images. My prompt is, “The two girls are sitting on the couch drinking coffee”. The result generated has coffee, with some breakfast items on a table. The face looks different from their original images. I modified the prompt by adding, “maintain pixel-perfect fidelity to original facial features” The girls are sitting on the floor but now the face matches the original image. I made a few changes to use the openpose skeleton, and the image uploaded, will be used to extract the pose. The same pose will control the image generation process of the girl. The height and width for the empty latent image is from the pre-processor for the best results. My prompt was simple: the girl is sitting on the couch. Make sure to keep the size around 1024 pixels when using the pre-processor for empty latent image. There are multiple options to control the image generation. To check, you can bypass this section and then select one from the dropdown. Run the workflow, and the result should be generated in a few seconds. To use the GGUF models, just add the U-Net loader, the remaining workflow can be used as it is. I found some GGUF models giving bad results. If you have an 8-gigabyte card, try using the Q4 KS model, as it gave some decent results. However, I did not ask for these glasses. While making this video, the nunchaku team released the lightning models. These FP4 models should be used on a 5 thousand series graphic cards. This is the lightning model, and this is the slower 4-bit model. I tried using the slower model, and it took 19 seconds for one step. I was testing for quality with 40 steps, and the generation time was 13 minutes. These 128 models are better than the 32 models. The INT 4 models should work with four thousand series and older cards. These are the lightning models for the same cards. Adjust the number of steps in the k-sampler based on this, and then this should work. Place the file in the diffusion model folder. In the workflow, this DIT loader will load the downloaded model. If you are having a problem with this, then I have shown how to setup nunchaku workflow in multiple videos previously, links will be in the description. So, I have uploaded these three images, and the prompt used here, may look a bit different, The generation speed is faster than the FP8 model, I am sure this is at the cost of quality. The image generated has few changes. I tried again by modifying the prompt to maintain consistency. The generation was faster than before, but the t-shirt is still of a different color. I did manage to get the perfect result in a few attempts. The open-pose skeleton works very well with nunchaku models. I have made a connection with the pre-processor and disabled the third image node. Virtual try-on works as well, just add a detail lora, and the image should be good for use. So, that is it in this video, I hope you have an excellent day.
Qwen Edit 2509 - The Prompt You Must Know + Open Pose Control with Nunchaku Workflow
Channel: AI Ninja
Share transcript:
Want to generate another YouTube transcript?
Enter a YouTube URL below to generate a new transcript.