YouTube to Text Converter

Transcript of Qwen Edit 2509 - The Prompt You Must Know + Open Pose Control with Nunchaku Workflow

Video Transcript:

The Qwen Image Edit 2509 performs great,   let’s look at some of the results  generated using the smaller FP8 model.  My prompt was, “change the figure in the  picture to a professional photo with a blue   background wearing a black suit and tie, and take  a close-up of the upper body facing the camera.  The character should remain  consistent without any changes”.  Generation time for the first image on my 4060 TI  16-gigabyte took 175 seconds, and the result had   maintained the facial identity of the given image. My prompt was, “Modify this image to make   the person appear visibly heavier while  maintaining a natural and realistic look”.  To maintain the consistency, I specified “The  image should preserve the original's integrity,   lighting, color palette, and overall style. The girl in the result looks heavier, as expected.  The loss of details is clearly visible in the  image, but the base model is working great, adding   details using an lora should not be a challenge. Here I tried to extract the person in the middle   of the image and wanted to create a  close-up, with a black background. The prompt I am frequently using is,  “Ensure that the character's face and   body proportions remain unchanged”. This  was given by GPT which works all the time. The latent image provided to the k-sampler  is from the same image I uploaded.  But in this case an image with an  aspect ratio of 1:1 may look better.  So, a different size latent image  can be given by connecting the   “Empty latent image” node to the k-sampler. The generation time for the subsequent images was   around 60 seconds, And the image looks better now. I tried changing the face to a younger version of   the girl in the picture, but  the result was unrealistic.  I tried a few times, but it did not work. Maybe the model struggles working with face,   so I tried making changes to the person’s image. I tried making the subject look bald,  The result looks natural, and  exactly how it could have been.  Next I tried manipulating the background, my  prompt was to remove all the police in the   background while maintaining the subject. After a few tries I think this is the best   result I obtained, I specified to keep  the proportions and background unchanged.  And instructed to redraw the  details of the removed areas,   and match the elements with the image The prompts used will be shared so it becomes   easier for you to manage through these edits. I tried adding cyborg-like enhancements,   neon-illuminated wearables with  cyberpunk style to the same subject.  These elements should be extended  from the character's existing   traits like her body type, age and gender. The change of the background was not asked,   but it goes very well with the subject. The model understands and works well with clothes,   my prompt was to extract the police  uniform and place it on a white background,   as a product display picture. The tie seems to be fixed,   and the police badge is added, which was covered  by the braid hair style of the character.  Virtual try-on works as well with this model. Changing the background works, I tried adding the   Eiffel Tower from Paris, without any street and  buildings behind it. The character’s edges were   identified flawlessly and the changes were made. In this case, I tried changing the color of   the sofa, to a brown leather  sofa with an evening ambiance.  This model identified the sofa on the left,  and even this; which is not even visible.  Also, the model identified this  chair and left it as it is.  The image given was some random  pots, and the prompt was about   turning them into a product photography. The handle is visible on one of these pots,   and placed perfectly in the result. My prompt here was, “put some nice   flowers in it.” which is just a normal  conversation-like sentence and it works.  I tried multiple images with watermark  and the prompt I am using here really   worked on all the images. I will share the prompt,   so it becomes easier for you. I tried getting some kind of   design pattern. I am not into this profession,  so, I do not know what this diagram is called.  The correct person should get the  idea if he comes across this image.  I tried adding a rear wing to this  vehicle and it was added correctly.  Though I was not sure if it  is called a wing or a spoiler.  Generate a close up on the girl’s face. Ultra sharp details like a professional   photographer. And ensure that the   character's face remains unchanged. Maintain  the elements, pose, proportions, and features. The face generated here looks great,  if I zoom in on the original image,   the character's face becomes pixelated.  The details on the face are not present in the  image, but it has been generated flawlessly.  The facial identity might change in such  cases if you do not use the character lora.  But the results generated here closely  resemble the character from the given image.  Adjust the woman’s facial  expression so she looks surprised  Adjust the woman's hand and make her hold a  rifle, the rifle blends with the painting.  I tried changing prompts, generated multiple  times but the result always had bad hands.  The workflow I am using is simple, the  model selected here is the FP8 model.  And this is where you can download the model from;  above is the larger 40-gigabyte model which must   be even better than what you saw just now. The GGUF models were also released a   few hours ago, file size ranging  between 7-gigabyte and 21-gigabyte.  These are the lightning lora; and I  suggest using this image edit lora.  I tried the below image lora,  and that worked as well.  A few hours ago, the nunchaku version of  the Qwen Edit 2509 was released as well.  These FP4 models are for  the 5 thousand series card.  This R 128 model gives a better quality  image, in comparison to this model.  The two models below should be used on systems  having 4 thousand series or older cards.  Place these models in the diffusion models folder. The files below are some of the   common files we use all the time. Like this lightning lora, the clip file and VAE.  The image node connects to the prompt node, This “text encode Qwen image edit plus” does   more than processing the text. The rest is the same which I   explained in the previous Qwen videos. If you do not find this node even after   updating Comfy UI, then switch to  the nightly version and restart.  Link to the files are added here, but  the time when I was making this workflow,  The workflow for this model was released  by Comfy UI in the template section.  And these recommended files  are the same I was using.  This workflow has the option to upload  three images, and the one I was testing   had an option for two images, the third image  connection is not used, the rest is similar.  This note says about the steps  and CFG for different models.  With the original 40-gigabyte model, the number  of steps recommended is 50, for the best results.  And 20 steps for the FP8 model. If you use a lightning lora,   based on the steps specified here, you  can change the steps in the k-sampler.  The note below is what I showed you, to  adjust the image size using the empty   latent image instead of the uploaded image. And this is about using the resize node, in case   you upload a larger image the latent image for  the k-sampler will be adjusted around 1024 pixels.  Enable the image upload node and upload  multiple images in the generation process.  Let’s look at some of the examples. I am using my workflow for two   images with the FP8 model and the 8 step lora. And my prompt was about placing the headphones   over the head of the person in the image. It is not a bad generation, but it failed   to generate the correct product. I wonder if the original   model can manage in such a case. My prompt was, “the woman is sitting on the bench”  Image one is on the left, and then the  second image. But the workflow works even   if I upload images in random order. I went back to the single image   workflow and generated a close up  of the woman sitting on the bench.  And then in the multi image workflow,  I uploaded an image of a handbag.  My prompt was, the woman with a handbag, sitting  on a bench, the handbag over her shoulder.  Ensure that the character's face and  body proportions remain unchanged.  The handbag generated is accurate, still if i  want to pick mistakes; then it will be the handle,   the handle generated is long and in the original  image the handle is stitched to the bag.  This is the workflow from Comfy UI’s  template, and I have uploaded three images.  My prompt is, “The two girls are  sitting on the couch drinking coffee”.  The result generated has coffee,  with some breakfast items on a table.  The face looks different  from their original images.  I modified the prompt by adding,  “maintain pixel-perfect fidelity   to original facial features” The girls are sitting on the   floor but now the face matches the original image. I made a few changes to use the openpose skeleton,   and the image uploaded, will  be used to extract the pose.  The same pose will control the image  generation process of the girl.  The height and width for the empty  latent image is from the pre-processor   for the best results. My prompt was simple:   the girl is sitting on the couch. Make sure to keep the size   around 1024 pixels when using the  pre-processor for empty latent image.  There are multiple options to  control the image generation.  To check, you can bypass this section  and then select one from the dropdown.  Run the workflow, and the result  should be generated in a few seconds.  To use the GGUF models, just add the U-Net loader,  the remaining workflow can be used as it is.  I found some GGUF models giving bad results. If you have an 8-gigabyte card, try using the   Q4 KS model, as it gave some decent results. However, I did not ask for these glasses.  While making this video, the nunchaku  team released the lightning models.  These FP4 models should be used on  a 5 thousand series graphic cards.  This is the lightning model, and  this is the slower 4-bit model.  I tried using the slower model, and  it took 19 seconds for one step.  I was testing for quality with 40 steps,  and the generation time was 13 minutes.  These 128 models are better than the 32 models. The INT 4 models should work with   four thousand series and older cards. These are the lightning models for the same cards.  Adjust the number of steps in the k-sampler  based on this, and then this should work.  Place the file in the diffusion model folder. In the workflow, this DIT loader   will load the downloaded model. If you are having a problem with this,   then I have shown how to setup nunchaku  workflow in multiple videos previously,   links will be in the description. So, I have uploaded these three images, and   the prompt used here, may look a bit different, The generation speed is faster than the FP8 model,   I am sure this is at the cost of quality. The image generated has few changes.  I tried again by modifying the  prompt to maintain consistency.  The generation was faster than before, but  the t-shirt is still of a different color.  I did manage to get the perfect  result in a few attempts.  The open-pose skeleton works  very well with nunchaku models.  I have made a connection with the pre-processor  and disabled the third image node.  Virtual try-on works as well, just add a detail  lora, and the image should be good for use. So, that is it in this video, I  hope you have an excellent day.

Qwen Edit 2509 - The Prompt You Must Know + Open Pose Control with Nunchaku Workflow

Channel: AI Ninja

Convert Another Video

Share transcript:

Want to generate another YouTube transcript?

Enter a YouTube URL below to generate a new transcript.