Chuck's AI Bumping Thread

charlescharles Posts: 846
edited October 29 in Daz AI Studio

Creating Consistently Realistic Characters: My Workflow

With this tutorial I'll show how I take my daz images like this:

and goto this:

If you're looking to generate consistently high-quality, realistic characters, this is the workflow I use and recommend.

Tools You'll Need

To follow my workflow, you’ll need Automatic1111’s Stable-Diffusion-WebUI. While I had hopes for other platforms like Flux and Forge, they aren’t quite ready for this level of work yet. ComfyUI has potential, but I find it a bit tedious and unstable for my needs. Plus, its inpainting tools haven’t reached the quality required for this workflow (as of my last check several months ago).

System Requirements

You’ll need a powerful GPU with at least 8GB VRAM (ideally 24GB+), but if you’re a long-time DAZ user, I’m betting you already have that covered.

Steps to Get Started

1. Install Automatic1111’s Stable-Diffusion-WebUI

You can follow this YouTube tutorial for detailed installation instructions.

2. Download and Install the Model

For the best results, I recommend using this modified version of SD1.5: EpicPhotogasm.

  • At the top of the page, you’ll find different versions. I suggest using Z-Universal (not the inpainting version).
  • Download it and place it in your stable-diffusion-webui\models\Stable-diffusion directory.

3. Install ControlNet

Next, you’ll need ControlNet for additional control and detail in your generated images.

  1. Install ControlNet from this GitHub repository.
  2. Download some useful ControlNet models, such as:
    • IP-Adapter SD1.5/SDXL
    • IP-Adapter FaceID [SD1.5/SDXL]
    • Other modules like Canny and OpenPose are also helpful. You can find them here:
  3. For FacePlusV2 you will also need the Lora. Download from: https://huggingface.co/h94/IP-Adapter-FaceID/blob/main/ip-adapter-faceid-plusv2_sd15_lora.safetensor
    • Once download move it into: \stable-diffusion-webui\models\Lora

Note: The setup can feel a bit messy, but these modules will greatly improve your results.

4. Install Reactor (Optional)

For even more refinement, I highly recommend Reactor. It helps with reshaping faces and other fine details. You can download it from this GitHub repository.


Testing Your Setup

Once you have everything installed, here’s how to test it:

  1. Run your WebUI by launching webui-user.bat.
  2. In the top-left corner, switch the checkpoint model to the Z-Universal model you downloaded.

Image Generation with Img2Img

  1. Switch to the Img2Img tab.
  2. Upload the DAZ image you want to modify in the image input section.
  3. Configure the following settings:
    • Sampling Method: Use Restart (my recommended method).
    • Schedule Type: Use Automatic (if not available, select Karras).
    • Sampling Steps: Set between 33 and 40.
  4. Go to the Resize tab and click the yellow triangle icon to resize your image. If your image is larger than 1440px in width or height, scale it down to that or lower.
  5. Set the CFG Scale between 6 and 8 (for now, you can leave it at 7).
  6. Denoising Strength is key — start with 0.4 and adjust based on the results.
  7. Set the Seed to -1 for random seed generation.

Finally, hit Generate and see what you get!

controlnet1.png
789 x 924 - 411K
zz_a2_bd_002_a2.png
1440 x 1080 - 3M
bd_a1a3_p1.png
1440 x 1080 - 2M
zz_a2_bd-a1a3.png
1440 x 1080 - 2M
zz_c2_bd_a1a2.png
1440 x 1080 - 2M
Post edited by Richard Haseltine on

Comments

  • charlescharles Posts: 846
    edited October 10

    ControlNet Setup

    1. Open the ControlNet accordion tab.
    2. In ControlNet Unit 0, check Enable.
    3. Select IP-Adapter.
    4. From the model dropdown, choose either:
      • ip-adapter-faceid-plusv2_sd15, or
      • my preferred ip-adapter-plus-face_sd15.
    5. The preprocessor should automatically update to ip-adapter-auto.
    6. Set the Control Weight slider to around 0.6.
    7. Check Pixel Perfect (this will remove your resolution slider, and I will need to do a full page guide and explanation on what this is and why later as pixel perfect is not always the best, but for now lets say it is)
    8. Check Upload Independent Control Image.

    If you only have one control image, you can upload it now. For best results, select Multi-Images if you have a profile set of different face angles. Add them if available.

    (I'll go over in the next post on ways to create profile packages like this)

    Click Generate to see the results.


    Reactor Setup (Optional)

    https://github.com/Gourieff/sd-webui-reactor

    If you installed Reactor, you can further refine the face. Here's how:

    1. Enable Reactor from its accordion tab.
    2. Add one image — usually, an angled or side profile works better than a straight-on shot.
    3. Uncheck Swap in Generated Image and check Swap in Source Image. This swaps the face at the beginning of the process, allowing ControlNet to apply its effects properly. If you swap in the generated image, it will override the ControlNet details.

    Click Generate to view the results.


    FacePop (Optional)

    https://github.com/TheCodeSlinger/FacePop

    If you installed FacePop, enable it from its accordion tab. That’s all you need to do, but if you want to explore the masks and tweaks FacePop applies, check Enable Debugging. In your output folder, you'll find a subfolder called Debugging with useful images like landmark detections, mask creations, head orientation corrections, and final overlays before processing.

    Note: The final FacePop image might not appear inside the WebUI. Instead, check the folder where the images are generated. I use a timestamp-based naming system instead of the typical seed and image count naming. Also, metadata reproduction in FacePop images isn’t fully implemented yet, but that should come in version 1.3.


    What’s Working Together?

    • Photo-realistic Model: I recommend EpicPhtogasm for great results. https://civitai.com/models/132632/epicphotogasm
    • Sampling Method: Use Restart, a hidden gem for sampling.
    • Denoising: I suggest a moderate setting of 0.4. You can adjust this, but keep in mind that higher values (around 0.5 to 0.6) will create more variation from the original image.
    • CFG Scale: Adjust this depending on how much of your prompt you want to apply.

    Adding Prompts

    Don't forget to add a prompt to describe the image and character. Start with the character’s description, then add the expression, pose, action, and setting. Example:

    Prompt:
    young woman with blonde hair wearing a toga (flirty smile) standing in parking lot at night

    For the Negative Prompt, it's important to list things you don’t want in the image. Example:

    Negative Prompt:
    cartoon, painting, illustration, (worst quality, low quality, normal quality:2), necklace, lipstick

    I often include necklace and lipstick because models tend to add them automatically. Adjust this based on what you find appearing in your images that you don’t want.


    Key Tools Working Together

    1. ControlNet: Using the Plus Face model with multiple input images from different profile shots.
    2. Reactor: Helps to reshape and refine the face.
    3. FacePop: Aligns and scales the face for optimal processing.

    If anyone needs more detailed help, feel free to PM me for a Discord invite, and I can guide you through troubleshooting.

     

    Post edited by charles on
  • Nyghtfall3DNyghtfall3D Posts: 776

    It should be noted that Stable Diffusion is heavily biased toward portraits, so if you plan to try this workflow on complex scenes that require characters look in any direction but the camera, you're going to quickly find it to be an exercise in futility.

  • charlescharles Posts: 846
    edited October 10

    Nyghtfall3D said:

    It should be noted that Stable Diffusion is heavily biased toward portraits, so if you plan to try this workflow on complex scenes that require characters look in any direction but the camera, you're going to quickly find it to be an exercise in futility.

    SD can do a LOT more than portraits my friend, stay tuned as I continue to update these pages. It does have it's weaknesses, but I will try and show how when combined with Daz you can overcome almost all those. Eye direction is probably the most frustrating aspects, but there are prompts and tools to help correct that. As well as my FacePop tool I'm working on a new method to restore original eyes and process those seperatly (WIP still) but there is at least 4 different methods I know of fixing eye gaze. But that's something I want to address later on in this thread. As well as glasses and fingers..will go over it all eventually. There is also a further option later on for switching for detailing in SDXL that does play a bit nicer with eye gaze, but the image below is just SD and inpainting and a bit of PS.

    zz_a2_bd_001_a7aaa.png
    1440 x 773 - 2M
    Post edited by charles on
  • Nyghtfall3DNyghtfall3D Posts: 776
    edited October 10

    charles said:

    SD can do a LOT more than portraits my friend...

    Indeed, it can.

    Eye direction is probably the most frustrating aspects...

    That was my point.

    Your workflow outlines how to create consistent faces.  SD is powerful, but when it comes to faces, getting character eyes to point in any direction but the camera is like trying to reposition wallpaper after the glue has dried.  You might achieve some minute measure of success after a few hours of prompting, but it probably it won't look pretty, and then you'll spend another few hours trying to refine what you managed to wrangle from the model.

    EDIT: Nice work on the image.  Now try getting getting either of them to look at the other, or have her look down at her coffee mug.

    Post edited by Nyghtfall3D on
  • charlescharles Posts: 846
    edited October 10

    .

    Post edited by charles on
  • charlescharles Posts: 846
    edited October 10

    Nyghtfall3D said:

    charles said:

    SD can do a LOT more than portraits my friend...

    Indeed, it can.

    Eye direction is probably the most frustrating aspects...

    That was my point.

    Your workflow outlines how to create consistent faces.  SD is powerful, but when it comes to faces, getting character eyes to point in any direction but the camera is like trying to reposition wallpaper after the glue has dried.  You might achieve some minute measure of success after a few hours of prompting, but it probably it won't look pretty, and then you'll spend another few hours trying to refine what you managed to wrangle from the model.

    EDIT: Nice work on the image.  Now try getting getting either of them to look at the other, or have her look down at her coffee mug.

    Challange accepted, but it's late so something for tomorrow if I can get to it. But I am not limited to SD only, I wil weild whatever tools needed to achieve the goals. SD is the best for the base character processing IMO, that I've seen so far. Because take in account what it does for skin specular, hair and clothing. This workflow isn't a one click and done. And YES it can sometimes take a many tries to get the desired results, but the results can be amazing if one is willing to put the effert into it. as I just posted there is postwork to be done, there I may take half a dozen or more images and cake them on top of one another to tweak out the best of the best, and the process maybe more than some want to put it into it..that's up to them, I'm just going to outline what I do..If people have new ideas to add GREAT! I may learn a few things new along the way, I hope so!

    I promise no looking at birds though..that thing is pretty bad.

    Post edited by charles on
  • Nyghtfall3DNyghtfall3D Posts: 776

    Alas, postwork is where my skills end.  I know next to nothing about that.

  • charlescharles Posts: 846
    edited October 11

    Character Profile Package

    A character profile package is a collection of detailed images of your character that are used during the AI image generation process. This helps guide the AI more accurately. This technique is similar to face swapping or deep fakes, but with better accuracy because you'll have a more complete package of profile and angled shots of the character.

    There are several ways to create such a package—whether by using images from MidJourney (which I used for the profile of the woman in the example), real photos of a person, or software like DAZ 3D.

    Building a Character Package with DAZ 3D

    For this example, we’ll use DAZ 3D to create a character profile package. DAZ offers great control over character positioning, cameras, and lighting, allowing you to render precise profile and angle shots. You can customize your characters using a wide variety of pre-designed characters and morphs, giving you flexibility in creating your ideal character quickly.

    Here’s a quick step-by-step guide to building your character package:

    1. Set Up Your Character

    For this example, I'll use KOO Clyde G9 as the base character, with Qinfen Hair for Genesis 9 and Padded Armor as the outfit. Let's call this character Manuel.

    2. Choose a Well-Lit Environment

    I recommend using a well-lit HDRI environment, such as the “Almost White” set from Render Studio.

    3. Adjust Render Settings

    In the render settings:

    • Set the size to 720x720 pixels.
    • Create a basic camera for rendering.

    Note: The most common camera type used by Stable Diffusion and many AI models is 50mm, but in DAZ, I suggest leaving the default camera settings. The reason is that with close-ups, the 50mm lens might distort the face, and a 35mm lens can cause too much skew. The default DAZ camera settings seem to work best for this method. However, feel free to experiment with other settings to see what works best for your character.

    4. Position the Camera

    • Move the camera to face the character directly, head-on.
    • Frame the face, leaving about 10% margin from the top of the head and the bottom of the chin, with about 20% margin on the sides.
    • Include some of the neck and shoulders, but don’t crop the top or back of the head. Keeping the full head in the frame ensures the AI’s detection system doesn’t get confused.

    5. Lighting

    Disable the camera’s default headlamp as it can create unwanted lighting effects. Instead:

    • Create a Spotlight and set its Luminous Flux (Lumen) to around 20,000.
    • Position the spotlight just in front of the character, slightly above and to the right, angled towards the face.

    6. Additional Camera Angles

    Create additional cameras to capture profile shots from various angles:

    • Profile view: Directly from one side.
    • Angled shots: Capture 30° and 45° angles.
    • Over the shoulder: A shot from behind.

    7. Render Your Shots

    Render each shot with these cameras. Once rendered, you now have a solid character profile package.


    Bringing the Images into Stable Diffusion (Automatic1111’s WebUI)

    Now that we have the profile images, we can use Automatic1111’s Stable Diffusion WebUI to refine and enhance them.

    1. Load the Model

    Ensure the model you want to use is loaded. For this example, I'm using “epicphotogasm_zUniversal”.

    2. Open the Img2Img Tab

    1. Load the frontal image you rendered in DAZ into the Img2Img tab.
    2. Add a prompt like: 20-year-old Spanish man

    Note: It's essential to include agegender, and ethnicity to help guide the AI. Most models default to young, white, brunette women, so this added detail is crucial for accurate character generation.

    Tip: You can include a name in the prompt (e.g., "Manuel Sanchez") to help guide the AI towards consistency. However, if the name is tied to a well-known individual in the dataset, the output might drift towards that person’s likeness, so use this tactic with caution.

    3. Add a Negative Prompt

    In the Negative Prompt field, include the following:

    cartoon, painting, illustration, (worst quality, low quality, normal quality:2), necklace, lipstick

    This helps to avoid unwanted artifacts such as cartoonish effects or unnecessary accessories.

    4. Adjust Settings

    • Sampling Steps: Set to around 40.
    • Sampling Method: Select Restart.
    • Schedule Type: If available, choose Automatic. If not, select Karras.
    • CFG Scale: Leave it at 7 for now.
    • Denoising Strength: Start at 0.32 to 0.4. A lower setting keeps the image close to the original, while a higher setting introduces more realism. Adjust based on your preference.

    Once you’re satisfied with the results, copy the seed from the information below the preview image. You can also review previous renders by clicking the folder icon under the preview.

    Tip: You can drag any image back into the image box and click PNG Info to retrieve its seed and settings for consistent results.

    5. Apply Img2Img to Other Angles

    Using the same settings, process the other profile and angle shots. You may need to adjust the denoising strength for the side or over-the-shoulder shots.

    6. Fine-Tune the Details

    If your character has additional features (e.g., a goatee or facial hair), make sure to mention them in your prompt. For example:

    Prompt: "20-year-old Spanish man, goatee"

    Add any relevant exclusions to the Negative Prompt, like:

    cartoon, painting, illustration, (worst quality, low quality, normal quality:2), necklace, lipstick, beard


    Dealing with Minor Details (Freckles, Moles, etc.)

    If minor details like freckles or moles get lost in the AI’s process, it’s challenging to recover them fully at this stage. The AI often treats these as noise and ignores them. However, in a future post, I’ll discuss how to use post-processing and advanced layering to bring back these small but important details.


    Conclusion

    By following these steps, you’ll have a collection of 3 or more head profile images that can be used in ControlNet’s IPAdapter Multi-Input feature to guide the AI in generating consistent, high-quality characters.

    Note: For your Profile Character Package, you don't want to have multiple shots of the exact (or near to) same angle, exception being a possibly a full left and ride side profile shot.

    Untitled-8.png
    1883 x 543 - 176K
    Untitled-7.png
    1828 x 1002 - 803K
    cid_man0.png
    1982 x 1841 - 382K
    cid_man02.png
    1982 x 1841 - 352K
    cid_man1.png
    1982 x 1841 - 401K
    daz_bump_maneul.png
    3587 x 1438 - 7M
    Post edited by charles on
  • fred9803fred9803 Posts: 1,564

    Thanks. Great info charles!

    An alternative to "Building a Character Package" would be to use the images as a dataset in Kohya to train a LoRa of the character. Personally I'd find that easier to do.

  • charlescharles Posts: 846
    edited October 11

    fred9803 said:

    Thanks. Great info charles!

    An alternative to "Building a Character Package" would be to use the images as a dataset in Kohya to train a LoRa of the character. Personally I'd find that easier to do.

    You can I will get to that later, but it's not usually as flexible as this method as you can swap out faces a bit quicker and I find the IPAdapter does a bit better job and you can control the weights in ways you can't really with Lora's. There is also nothing that says you can't also combine Lora's with this technqiue either. But even with a Lora you need your profile packages for them.,,unless you are doing them someway I am unware of. I believe you will need more like 30+ profile picks of the character for good Lora training and what several hundreds Style images if I recall. Plus if I also recall Lora's work best when working in image resolutions that matches their training image resolutions. So if you train it on 512x512, and you want to use it on say an image you made in daz at 1440x1080 it won't work as good. But maybe I'm wrong, or things have changed it's been maybe almost a year since I last worked with Loras.

    If you would like to write a tutorial for this thread or link it here for how you do your Lora training that would be great.

    Post edited by charles on
  • charlescharles Posts: 846
    edited October 11

    Character Profiles Packages with Still Portraits and KlingAI

    Kling AI (klingai.com) is a text-to-video and image-to-video AI platform. Recently, other free alternatives like HailuoAI (hailuoai.video) have emerged, alongside tools like Runway Gen-3. While this process isn't as precise as using DAZ 3D, it can still help generate useful character profile images.

    For this example, I will use an ArtBreeder image to demonstrate the process.

    Preparing the Image for Use

    This image has a lot of potential, but the colors are a bit off. It’s common to need to do some color correction, resizing, or other adjustments before you begin working with the image in AI tools.

    [Fixing the Image]

    In this case, I’m going to bring the image into Photoshop to fix the color. You can follow these steps:

    1. Auto-Color Correction:

      • In Photoshop, go to Image > Auto Color.
      • Optionally, you can also use Image > Auto Tone to further balance the image.
    2. Resize the Image:

      • If the image is too small or too large, resize it to a size between 720x720 and 1024x1024. It doesn't need to be a perfect square, but try to keep it close to square dimensions.


    Outpainting for More Padding

    While the image is decent, it lacks enough padding around the head, neck, and shoulders. To create a better profile package, it’s helpful to include some of the neck and shoulders with a bit of extra space around the head.

    To add this padding, we’ll use a technique called Outpainting in Automatic1111’s Stable Diffusion WebUI.

    Steps for Outpainting:

    1. Img2Img Setup:

      • Load the image into the Img2Img tab.
      • Use a simple prompt like: "Mature woman, bare neck and shoulders".
      • For the Negative Prompt, I recommend: "necklace" (to avoid adding unnecessary accessories).
    2. Adjust the Settings:

      • Make sure no other features like ControlNet, FacePop, or Reactor are active.
      • Set the CFG Scale to 7.
      • Increase the Denoising Strength to around 0.75.
    3. Activate Outpainting:

      • At the bottom of the WebUI, there is a Script dropdown. Select “Poor Man’s Outpainting”.
      • Set the Pixels to Expand to around 64, then click Generate.
    4. Troubleshooting:

      • If you notice a rigid line between the original image and the outpainted areas, increase the Blur setting from 4 to 8, and gradually up to 12 or higher if needed.

    It may take several tries, but I got a decent result after about six attempts.


    Further Touchup with Inpainting

    Now that we’ve outpainted the image, we have an issue with earrings—they don't match and it would be better if the character had none. Here’s how to remove them using Inpainting:

    1. Inpainting Setup:

      • Set the script dropdown back to None (we’re done with outpainting).
      • Drag the outpainted image into the main Img2Img box.
      • Click the Inpaint button below the image to switch to the Inpainting tab.
    2. Mask the Area:

      • In the top-right corner of the image box, you’ll see a slider that adjusts the brush size.
      • Paint over the earrings with the brush to mark the areas you want to modify.
    3. Add the Prompt:

      • Prompt: "Mature woman ears".
      • Negative Prompt: "earrings, jewelry".
    4. Adjust Settings:

      • Set Sampling Steps to 40 and CFG Scale to 7.
      • Set the Denoising Strength to around 0.6, then click Generate.

    Repeat as needed to get the desired result. If the earrings don’t disappear, increase the Denoising Strength until they are removed.


    Touching Up the Background

    To further clean up the image, you can use Inpainting to refine the background:

    1. Mask the Background:

      • Just like before, use the Inpaint tool to draw over the background surrounding the character.
    2. Add the Prompt:

      • Prompt: "Mature woman in white room".
      • Negative prompt isn’t necessary, but you could add: "plant, flower, furniture, door, window, people, person, hands" to avoid unwanted details.
    3. Adjust Settings:

      • Set the Denoising Strength to around 0.9 and generate the image.

    This will help simplify and clean up the overall appearance of the image.


    Once You're Signed Up and Logged Into Kling (or Another Platform)

    Now that the image is prepared, here’s how to create a character profile using platforms like Kling AI or HailuoAI:

    1. Switch to Image-to-Video Mode:
      Upload the ArtBreeder or processed image into the image-to-video tool.

    2. Create Prompts for Head Movement:
      Use prompts like:

      • "woman turning away"
      • "woman looks behind"
      • "woman turning to the side"

      The goal is to generate a video where the character turns their head completely or moves to the side, without blurriness.

    3. Enhance Output Quality:
      In Kling, you can upgrade to Professional Mode to generate higher-resolution videos.

    4. Adjust Creativity and Relevance:
      Adjust the Creativity/Relevance slider:

      • More Creative: Allows for more head movement but may introduce unnatural or exaggerated movements.
      • More Relevant: Restrains the creativity but may limit the range of movement.

      Results can vary, so you may need multiple attempts. HailuoAI is currently free with no processing limits, but you can only work with one image at a time, and the queue might be long.


    Good Image Selection

    When dealing with video-generated images, ensure that you select crisp, detailed images of the face. Avoid using motion-blurred images, as they won't work well for further refinement.

    (example of bad image, too much pixel blur)


    Capturing and Processing Screenshots

    Once you have the video of the character turning, you'll want to extract still images from it to create your Character Profile Package. Here's how to do that using a paint program or the Windows Snipping Tool:

    Option 1: Using a Paint Program (e.g., Photoshop)

    1. Capture the Screenshot:

      • Press the Print Screen (PrtScn) button on your keyboard to capture the entire screen.
    2. Open Photoshop:

      • Launch Photoshop on your computer.
    3. Create a New Document:

      • Go to File > New or press Ctrl + N (Cmd + N on Mac).
      • Photoshop will automatically match the dimensions of the new document to your clipboard screenshot. Click OK.
    4. Paste the Screenshot:

      • Press Ctrl + V (Cmd + V on Mac) to paste the screenshot.
      • Alternatively, go to Edit > Paste.

    Option 2: Using the Windows Snipping Tool

    1. Open the Snipping Tool:

      • Search for Snipping Tool in the Start menu.
    2. Capture the Screenshot:

      • Click New and select the area you want to capture from the video.
    3. Save the Screenshot:

      • Click the floppy disk icon or go to File > Save As to save the screenshot in your desired location and format (e.g., PNG, JPEG).

    Creating a Character Profile Package

    Once you've captured screenshots of the character's head in different positions (front, side, angled), follow these steps:

    1. Use the Time Slider:

      • In your video editor, use the time slider to capture different angle shots. You'll want a few more side profiles, in addition to the original portrait image.
    2. Crop the Images:

      • Crop the images so they focus on the head, neck, and shoulders (if visible).
      • Ensure that the images are square and consistent in dimension for uniformity but don't have to be exactly square.

    Character Profile Package

    From a single frontal image from ArtBreeder we have created a detailed character package in multiple angles and full profile that look AMAZING!

    UPDATE: The more I test HailuoAI, it seems to do a better job following prompt directions than KlingAI.

    cid_xx_pro1a.png
    3587 x 723 - 3M
    ab18.png
    2048 x 2048 - 2M
    cid_cc1.png
    2048 x 2048 - 3M
    cid_ee1_720_op.png
    832 x 832 - 781K
    cid_ii1.png
    928 x 825 - 598K
    cid_ii2.png
    928 x 825 - 520K
    cid_ii3.png
    794 x 1852 - 663K
    cid_cc0.png
    720 x 723 - 494K
    Post edited by charles on
  • charlescharles Posts: 846
    edited October 12

    Character Profile Packages with MidJourney

    We’ve already covered how to create character packages with DAZ 3D and Kling, but what about using generative image AI? You can use Stable Diffusion (SD), SDXL, Flux, DALL-E, or whatever platform you prefer, but for this tutorial, I’ll focus on MidJourney, because it has some unique tools that make this process smoother compared to the others.

    The woman in the very first example of this thread was created using this technique in MidJourney.

    For this example, I’ll create something I haven’t done before: a fantasy character.

    Step-by-Step Guide

    1. Connecting to MidJourney and Generating the Initial Image

    Connect to MidJourney through Discord and use the following prompt:

    Copy code

    /imagine photo portrait shot of a 30-year-old Elven woman, detailed skin, fine details, in a white room. 50mm camera lens --no necklace, earrings, jewelry --ar 4:5

    [Explaining the Prompt]

    • I start the prompt by specifying that I want a "photo" and a "portrait".
    • Next, I describe the subject (the Elven woman) in detail.
    • I include "white room" or "white/black background" to avoid noisy or distracting backgrounds.
    • The 50mm camera lens is a good choice since most AI models are trained using images captured with a 50mm lens, which is ideal for portrait shots.
    • --no: I exclude things like jewelry or accessories because we want a clean headshot.
    • --ar 4:5: This sets the aspect ratio of the image to 4:5, which is great for portrait shots.

    After about a dozen tries, I finally found a version I liked. It’s not a perfect frontal image and is a bit too close, but that’s fine. I’ll click Zoom Out 2x in MidJourney.

    2. Upscaling the Image

    Once the zoomed-out images are generated, I’ll select the best option by choosing U# to upscale it. After the upscale is done, click the image, and just below the image on the left, click Open in Browser to open the full-sized image in a new tab.

    Keep this tab open, then return to Discord.

    3. Generating Different Angles with MidJourney

    Next, I want to generate different head angles:

    Copy code

    /imagine photo realistic character reference, 4 angles of head including front and side profile, of Elven girl, white background, color, 50mm camera lens --cref (paste the image URL here)

    • --cref is a recent addition to MidJourney, and it stands for Character Reference. It uses the URL of the original image to generate variations based on that image.

    Before hitting return, go back to the image tab, copy the URL of the full-sized image, and paste it after --cref.

    The goal is to generate a variety of head angles, including frontal and side profiles. It’s okay if not every image comes out perfect—we’ll refine it later. After a few tries, I ended up with some good results.

    4. Cropping and Resizing the Images

    Now I’ll crop each headshot into individual images and resize them to around 720x720 pixels. In PhotoPaint (or your preferred software), resize the largest dimension to 720 and keep the aspect ratio locked.

    Some of the images might lack padding at the top of the head or around the shoulders. To fix that, we’ll use Outpainting in Stable Diffusion WebUI.


    Outpainting with Stable Diffusion

    1. Drag one of the images into Stable Diffusion WebUI, under the Img2Img tab.

    2. Use this prompt:

    • <strong>Prompt:</strong> Elven girl, bare neck and shoulders, white background
    • Negative prompt: plant, flower, furniture, door, window, people, person, hands
    1. Make sure the image dimensions are correct by clicking the yellow triangle in the Resize To box.

    2. Set the Script dropdown to Poor Man’s Outpainting. Set Pixels to Expand to 48 and Mask Blur to around 8.

    3. You might need to run several attempts to get decent results. Increasing Sampling Steps to around 60+ can help.

    Repeat this process for any images that need padding or adjustments.

     


    Refining the Profile Images

    Our original image looked very photorealistic, but the profile shots might not match that level of realism—this is common when using --cref in MidJourney. To fix this, we’ll refine these images by using the AI Bump technique, much like we did with the DAZ example (Manuel). We’ll also use the original photorealistic image as a reference guide.

    1. Prepare the Original Image

    First, crop and resize the original image if it’s too large.

    Set the Script dropdown to None to turn off outpainting.

    2. Refining the Profile Images

    1. Load one of the profile images into the Img2Img picture box.

    2. Use this prompt:

    • <strong>Prompt:</strong> Photo realistic image of a pale-skinned Elven woman with freckles
    • Negative prompt: cartoon, painting, illustration, (worst quality, low quality, normal quality:2), necklace

    Notice that I left lipstick out of the negative prompt this time because I want the lips to retain some color.

    1. Adjust the settings:

      • Sampling Method: Restart
      • Schedule Type: Automatic or Karras
      • Sampling Steps: 30-40
      • CFG Scale: 7
      • Denoising Strength: 0.4
    2. Make sure no other extensions are active.

    3. Using ControlNet for More Precision

    1. Open the ControlNet accordion box.
    2. On Unit 0, check Enable.
    3. Select IP-Adapter.
    4. Check Pixel Perfect.
    5. Check Upload Independent Control Image and upload the original photorealistic image.
    6. Preprocessor: ip-adapter auto
    7. Model: ip-adapter-faceid-plus_sd15
    8. Control Weight: Set to around 0.8.

    Click Generate.


    Tips for Fine-Tuning

    • If the source image looks too cartoonish, try increasing the CFG Scale to 8 and the Denoising Strength to 0.52.

    • To reduce glossiness, add these keywords to the prompt:
      matte skin, (natural skin), (soft lighting)
      and these to the negative prompt:
      (glossy skin), (oily skin)

    • You can also try switching to the ip-adapter-faceid-plusv2_sd15 model for different results. If you use this model, include the matching LoRA. In the LoRA tab under the prompts section, select the ip-adapter-faceid-plusv2_sd15_lora. The prompt will automatically update to include &lt;lora:ip-adapter-faceid-plusv2_sd15_lora:1&gt;.

    If you don’t have the LoRA, download it from this link and place it in \stable-diffusion-webui\models\Lora. Then refresh the LoRA tab to load the new model.


    Final Result

    In the end, you should have a good collection of head angles for IPAdapter to work with:

    Note: The ears might not match the original reference exactly, but for our purposes, they are close enough.

    Untitled-23.png
    819 x 815 - 416K
    Untitled-22.png
    959 x 835 - 576K
    elf_a2.png
    1086 x 1203 - 149K
    elf_a1.png
    1247 x 1302 - 152K
    forbiddengraphictales_photo_portrait_shot_of_30_year_old_Elven__ee3f25a8-b009-4e80-bad9-e6adb73cc15b.png
    960 x 1200 - 2M
    forbiddengraphictales_photo_portrait_shot_of_30_year_old_Elven__59e9d569-fdf7-4ee9-bef8-d139d92c3874.png
    960 x 1200 - 1M
    elf_crop1.png
    695 x 720 - 699K
    forbiddengraphictales_photo_realistic_character_reference_4_ang_fdb427b2-0c7f-4efc-8f8e-de5645e9e2ea.png
    1024 x 1024 - 1M
    forbiddengraphictales_photo_realistic_character_reference_4_ang_0c0b8265-01f1-4299-a398-853f415229d6.png
    1024 x 1024 - 1M
    elf_x1a.png
    832 x 832 - 652K
    elf_pack1.png
    3892 x 723 - 4M
    elf_outpaint.png
    1485 x 710 - 1M
    elf_outpaint2.png
    1485 x 710 - 1M
    Post edited by charles on
  • charlescharles Posts: 846
    edited October 13

    FacePop in Depth

    FacePop is an extension for Automatic1111's Stable Diffusion WebUI and ForgeUI that enhances AI img2img processing by tackling the challenges of facial inconsistency when characters appear at different distances and orientations within images. It employs advanced face detection to locate faces—even small or tilted ones—and then crops, upscales, and rotates them to an upright position.. By processing these faces separately and reintegrating them into the original image with masking to prevent double processing, FacePop attempts to accurately preserve consitancy of characters across various image compositions.

    Analyzing the Example Images of Manuel

    Consider three rendered images of Manuel at a resolution of 1080x810 pixels (a 4:3 ratio), each capturing him in a cocky stance with his head tilted approximately 15 degrees to his left.

    1. Close-Up Shot:

      • Face Detection Box: Approximately 494x494 pixels.
      • Description: Manuel's face occupies a significant portion of the frame, displaying detailed facial features such as skin texture, eye color, and subtle expressions.

    1. Medium Shot:

      • Face Detection Box: Approximately 165x165 pixels.
      • Description: Manuel's upper body is visible, with his face smaller in the frame. Key facial features are still discernible but less detailed.

    1. Long Shot:

      • Face Detection Box: Approximately 64x64 pixels.
      • Description: Manuel appears full-body within the scene, and his face occupies a minimal area of the image, resulting in a significant loss of facial detail.

    These images serve as practical examples to explore how varying face sizes and orientations impact the performance of AI models during img2img processing.


    Technical Issues Illustrated by the Example Images

    Let's setup Automatic1111's Stable Diffusion Webui for doing an AI Bump using IPAdapter and our Character Profile Package of Manuel.

    You can see my configuration above. One thing to note is the denoising is set low for SD1.5 model of .32. The lower this is set the less the AI Bump will drift the final image from the original. Other models will handle denoising and CFG differently and .32 might be considered too high. Reference the information to whatever model you using for averages and recommended settings, as well as just experiement.

    Configure ControlNet with IpAdapter PlusFace and use our Character Profile Package for Manuel for guidance.

    Here is our results without FacePop

    Even though we used CN and IPAdapter the results are less than amazing or consistant of Manuel.

    Let's try this again with FacePop now active. We will also open and enable the Seperate Face Processing section. By activating we can override the default WebUI settings whe processing the face. We can specify Sampling Steps, CFG Scale, Denoising as well as a different Prompt and Negative Prompt (note: even when enabled if prompts are left blank they will use the default ones from above.)

    By default Output Faces is enalbed, which will include the face only processed image in the output folder along with the final processed image. 

    Final Image

    The results are probably not so obvious with the Long Shot of the AI Bumping, except to demonstrate how using these techniques without proper scaling and alignment, you will get inconsitant and poor quality results.

    So let's repeat the exact same process just swapping out the Medium and Closeup shots.

    Medium Final Results.

     

    Close Up Shot

    1. Impact of Face Resolution on Feature Extraction

    • Close-Up Shot (494x494 pixels):

      • High Detail Availability: The abundance of pixels allows the AI model to capture intricate facial features, enabling accurate recognition and processing.
      • Effective Convolutional Processing: Convolutional Neural Networks (CNNs) can generate detailed feature maps, leading to high-quality outputs during img2img transformations.
    • Medium Shot (165x165 pixels):

      • Moderate Detail Loss: While primary features like eyes, nose, and mouth are visible, finer details such as skin texture and subtle expressions begin to blur.
      • Reduced Feature Map Quality: CNNs receive less information, resulting in less detailed feature representations and potential degradation in output quality.
    • Long Shot (64x64 pixels):

      • Significant Detail Loss: Critical facial features merge or become indistinct. The low pixel count hampers the AI's ability to process the face accurately, often leading to poor or unrecognizable outputs.

    NOTE: Even with FacePop's upscaling of the face there is still going to be signifcant detail loss due to the abscene of information due to pixelation. The only way to really overcome this is to render the image at larger resolutions.

    2. Challenges with Face Detection and Landmark Localization

    • Close-Up Shot:

      • High Detection Accuracy: Face detection algorithms easily identify Manuel's face, and landmark localization is precise due to the high resolution.
      • Accurate Alignment: The model can align and process the face correctly, preserving Manuel's distinct features.
    • Medium Shot:

      • Reduced Detection Accuracy: The smaller face size may introduce minor errors in detecting facial landmarks, affecting alignment and potentially introducing inconsistencies in the generated image.
    • Long Shot:

      • Detection Failures: The face may fall below the minimum size threshold required by detection algorithms, leading to failure in identifying the face.
      • Incorrect Landmark Placement: Even if detected, low resolution can result in inaccurate landmark localization, causing distortions in the processed image.

    3. Effects of Head Orientation

    • Head Tilt at 15 Degrees:
      • Alignment Complexity: The tilt introduces rotational variance, making it more challenging for face detection and alignment algorithms, especially those trained predominantly on upright faces.
      • Impact on Different Shots:
        • Close-Up: High resolution mitigates orientation challenges, allowing for correct alignment.
        • Medium and Long Shots: The combination of lower resolution and head tilt increases the difficulty, leading to potential misalignments and inconsistent outputs.

    4. Information Loss Demonstrated

    • Nyquist Sampling Theorem Implications:

      • Close-Up Shot: Satisfies the sampling requirements, capturing high-frequency details essential for accurate reconstruction.
      • Long Shot: Violates sampling conditions due to insufficient pixels, resulting in aliasing and significant information loss.
    • Pixelation and Quantization Errors:

      • Long Shot: Each pixel represents a larger area, causing pixelation that obscures fine details and introduces quantization errors, which the AI may misinterpret during processing.

    Inconsistencies Observed in AI img2img Processing

    1. Variability in Generated Outputs

    • Close-Up Shot:

      • Consistent Facial Features: The AI produces a detailed and accurate representation of Manuel, maintaining consistency with his original appearance.
    • Medium Shot:

      • Slight Deviations: Minor inconsistencies may arise due to reduced detail, such as subtle changes in facial proportions or expressions.
    • Long Shot:

      • Significant Inconsistencies: The AI struggles to reconstruct Manuel's face accurately, potentially altering key features or even misrepresenting his identity.

    2. Artistic Style Fluctuations

    • Style Transfer Issues:
      • The AI may apply different artistic styles or fail to maintain stylistic consistency across shots due to insufficient detail guiding the transformation, especially in the Long Shot.

    3. Pose and Expression Misinterpretation

    • Misalignment Effects:
      • In lower-resolution images with head tilt, the AI might misinterpret Manuel's pose or expression, leading to unnatural or unintended results in the processed images.

    4. Amplification of Artifacts

    • Noise Introduction:

      • Low-resolution images are more susceptible to noise, which the AI may incorrectly amplify or incorporate into the output, degrading image quality.
    • Distortion Effects:

      • Errors in face detection and alignment can cause distortions, such as warped facial features or disproportionate elements in the generated images.

    Technical Factors Contributing to Observed Issues

    1. Convolutional Neural Network Limitations

    • Receptive Field Challenges:

      • In the Long Shot, the receptive fields of CNN neurons cover too large an area relative to facial features, leading to overgeneralization and loss of critical details.
    • Pooling Layer Effects:

      • Downsampling layers further reduce spatial dimensions, exacerbating the loss of already limited information from low-resolution inputs.

    2. Face Detection Algorithm Thresholds

    • Minimum Size Constraints:

      • Detection algorithms may require faces to be above a certain pixel size (e.g., 80x80 pixels). The Long Shot's 64x64 face falls below this threshold, resulting in detection failure.
    • Orientation Biases:

      • Algorithms may be less effective at detecting tilted faces, especially at lower resolutions, due to training biases towards upright faces.

    3. Generative Adversarial Network (GAN) Challenges

    • Discriminator Limitations:

      • In low-resolution scenarios, the discriminator may not effectively guide the generator towards accurate facial reconstructions, leading to poor-quality outputs.
    • Mode Collapse Risks:

      • The generator might produce repetitive or generic faces when it cannot extract sufficient distinguishing features from the input, particularly evident in the Long Shot.

    How FacePop Works to Improve AI img2img Processing

    FacePop is a specialized tool designed to tackle the challenges of AI img2img processing when characters appear at different distances and orientations within images. It enhances the quality and consistency of facial features in generated images, ensuring characters like Manuel look accurate and recognizable across all shots. Here's how FacePop achieves this:


    1. Aggressive Face Detection

    • Comprehensive Scanning: FacePop uses an advanced face detection algorithm that aggressively scans the entire image to locate any faces, no matter how small or tilted they are.

    • Why This Matters: Traditional face detection might miss small or angled faces, especially in long shots. FacePop's method ensures that every face in the image is detected for processing.


    2. Cropping and Upscaling the Face

    • Cropping the Face: Once a face is detected, FacePop crops it out of the original image.

    • Upscaling the Face: The cropped face is then upscaled to a default size of 720x720 pixels. Users can adjust this size based on their preferences. The aspect ratio is maintained by default to avoid stretching or squashing the face.

    • Adding Padding: An additional padding is included around the face—by default, this is 35% of the 720x720 size, but users can customize this. Padding ensures that the surrounding areas (like hair or parts of the neck) are included, which helps in seamless reintegration later.

    • Why This Matters: Upscaling increases the number of pixels representing the face, providing more detail for the AI to work with. Padding ensures context is preserved around the face.


    3. Correcting Face Orientation with Mediapipe

    Note: Why blue? At this stage the extension has converted the image to OpenPose module's native format of BGR which is different than the standard RGB. The image indicates the detected rotated upright position, while the red dots indicate the faceical landmark position's original non rotated location.

    • Using Mediapipe's Landmark Detection: FacePop employs Mediapipe, a powerful tool that detects facial landmarks such as the eyes and nose.

    • Rotating to Upright Position: By analyzing the positions of the eyes and nose, FacePop calculates the angle of the head tilt. It then rotates the upscaled face so that it's upright.

    • Why This Matters: Aligning the face to an upright position simplifies the AI's task, as most models are trained on upright faces. This improves the accuracy of facial feature processing.


    4. Separate AI Processing of the Face

    • Isolated Processing: The upscaled and rotated face is processed separately through the generative AI model (like Stable Diffusion).

    • Enhanced Quality: Because the face is now larger, upright, and isolated, the AI can focus on enhancing details, correcting imperfections, and applying styles more effectively.

    • Why This Matters: Processing the face separately ensures that the AI has the best possible input to work with, leading to higher-quality and more consistent results.


    5. Generating a Face Mask

    • Creating the Mask: Alongside processing the face, FacePop generates a mask—a black-and-white image that indicates where the face is located within the original image.

    • Alignment with Original Image: The mask is carefully aligned so that when the processed face is placed back, it fits perfectly over the original face area.

    • Why This Matters: The mask ensures that only the face area is updated in the final image, preventing any overlap or double processing.


    6. Reintegration into the Original Image

    • Restoring the Face: After processing, the enhanced face is rotated back to match the original angle and scaled down to fit the original image size.

    • Seamless Blending: Using the mask, the processed face is overlaid onto the original image, replacing the old face without affecting the rest of the image.

    • Preventing Double Processing: By processing the face separately and using the mask during the final image generation, FacePop ensures that the face isn't processed twice, which could cause distortions or inconsistencies.

    • Why This Matters: This step ensures that the improvements made to the face are integrated smoothly, maintaining the integrity of the original image while enhancing the character's facial features.


    7. Background Removal with MODNet

    • Using MODNet: FacePop includes MODNet (Mobile Object Detection Network) by default, which is a tool that removes the background from the face image during processing.

    • Focusing on the Face: By eliminating the background, the AI model concentrates solely on the facial features without interference from surrounding elements.

    • Why This Matters: Background elements can sometimes confuse the AI or introduce unwanted artifacts. Removing them leads to cleaner, more accurate facial enhancements.


    How FacePop Addresses Previous Challenges

    • Improved Detail in All Shots: By upscaling the face to a larger size, even faces from long shots (previously only 64x64 pixels) become rich in detail, allowing the AI to process them effectively.

    • Handling Head Tilts: Rotating the face to an upright position before processing eliminates issues caused by angled faces, ensuring consistent results regardless of the original orientation.

    • Consistent Character Appearance: Processing the face separately and reintegrating it carefully ensures that Manuel's facial features remain consistent across close-up, medium, and long shots.

    • Enhanced AI Performance: With better input data (larger, upright faces without background distractions), the AI model can perform at its best, leading to higher-quality outputs.


     

     

    20241013_011624_debug_face_1_detection_no_padding.png
    1440 x 1080 - 3M
    20241013_012226_debug_face_1_detection_no_padding.png
    1440 x 1080 - 3M
    20241013_020557_debug_face_1_detection_no_padding.png
    1080 x 810 - 2M
    20241013_011624_debug_face_1_landmarks.png
    704 x 720 - 815K
    20241013_011624_debug_mask_image.png
    1440 x 1080 - 58K
    Untitled-27.png
    576 x 600 - 302K
    Untitled-28.png
    928 x 923 - 406K
    man_no_facepop1.png
    1080 x 808 - 2M
    Untitled-30.png
    913 x 531 - 68K
    20241013_143211_processed_face_1.png
    720 x 696 - 707K
    20241013_143211_final_composite.png
    1080 x 808 - 2M
    Untitled-32.png
    914 x 505 - 60K
    Untitled-31.png
    933 x 911 - 1M
    20241013_150306_final_composite.png
    1080 x 808 - 1M
    20241013_150510_final_composite.png
    1080 x 808 - 1M
    Post edited by charles on
  • charlescharles Posts: 846

    Postwork..next (placeholder)

  • charlescharles Posts: 846
    edited October 11

    Overlays and Advanced Techniques...final (placeholder)

    Post edited by charles on
  • charlescharles Posts: 846
    edited October 29

    sorry haven't updated as quickly as I had hoped, but should have postwork section done by the end of this week.

     

    Post edited by charles on
  • scorpioscorpio Posts: 8,418

    Shouldn't this be in the AI forum rather than cluttering up the Commons.

     

  • Richard HaseltineRichard Haseltine Posts: 100,874

    scorpio said:

    Shouldn't this be in the AI forum rather than cluttering up the Commons.

    It might have gone in Art Studio too, but we do try to keep the AI stuff  in the AI forum so I have moved this.

  • FSMCDesignsFSMCDesigns Posts: 12,755

    Richard Haseltine said:

    scorpio said:

    Shouldn't this be in the AI forum rather than cluttering up the Commons.

    It might have gone in Art Studio too, but we do try to keep the AI stuff  in the AI forum so I have moved this.

    Really, so this forum is for all AI and not just DAZ AI?

  • Richard HaseltineRichard Haseltine Posts: 100,874

    FSMCDesigns said:

    Richard Haseltine said:

    scorpio said:

    Shouldn't this be in the AI forum rather than cluttering up the Commons.

    It might have gone in Art Studio too, but we do try to keep the AI stuff  in the AI forum so I have moved this.

    Really, so this forum is for all AI and not just DAZ AI?

    Yes-ish

  • Nyghtfall3DNyghtfall3D Posts: 776

    Richard Haseltine said:

    Really, so this forum is for all AI and not just DAZ AI?

    Yes-ish

    Ruh-roh...  :: mischievous, knowing smirk ::

Sign In or Register to comment.