Black Mixture

How to Create AI Images of Yourself with Others! │Multiple Consistent Characters Tutorial

Added 2024-11-04 11:15:01 +0000 UTC

1. Train custom LoRA.

Similar to my previous post where I initially sent prank photos to my family. I used CivitAI default settings to train a LoRA in flux off an uncaptioned dataset of 17 photos of Chriselle and I together. I tagged each photo "Nate and Chriselle". No other captions used. I've tried using captions but it kept feeling over trained in the results. The default CivitAI LoRA train settings came out better than all my re attempts using FluxGym and messing with the settings. Not that fluxgym or kohya training a LoRA through ai toolkit isn't good, I just don't know why the CivitAI ones just came out better on my attempts.

2. Generate & Control:

I initially made my previous photos with Forge UI and Fooocus, but now I prefer ComfyUI. ComfyUI provides a lot of flexibility but for 90% of gens, I think you can use Forge for an easier time. So if you're using Forge, just ignore the node references.

Both images were made with Flux Dev. The first Westworld inspired picture is based solely on a prompt while the second Batman and Catwoman photo is a combination of prompts, ControlNets, and Image to Image.

"A cinematic still from a western movie featuring a Black man and a Filipina woman, Nate and Chriselle, dressed in formal Western attire. They stride through an old town street, lined with wooden buildings, near a saloon engulfed in flames. Nate wears a sleek black tuxedo paired with a matching black shirt, while Chriselle is in a stunning blue Victorian-era Western dress with a heart-shaped neckline. Both wear matching cowboy hats, adding a unified look to their attire. A saddled horse runs nearby, outfitted with an old Western saddle and rugged bags. The scene is filled with dramatic smoke and glowing embers swirling in the air, with fiery chaos and thick smoke pouring from building windows, capturing the intensity of the moment. Nate and Chriselle have confident expressions on their face."

3. ControlNets + Img to Img:

To get the best results, I used tools like ControlNets and worked within latent spaces. Unlike our usual three-dimensional space, latent space has thousands of dimensions, allowing the AI to explore an abstract realm of concepts like “freedom” or “movement.” By guiding the AI within this space, we can bring our exact vision to life instead of leaving it up to interpretation.

We used a reference image of Batman and Catwoman from the Batman 2022 film.

Using the Depth ControlNet for Flux in combination with a 0.95 denoise value, we were able to achieve a similar pose and visual style for our generation.

A cinematic movie still of a black man and a Filipina woman Nate and Chriselle. Nate and Chriselle are wearing matching superhero outfits. Nate is wearing a tight fitting tactical batman inspired suit with a cape and utility belt. Chriselle is Catwoman, wearing black leather tights and neck collar. Chriselle is wearing cat ears. Nate and Chriselle are facing each other passionately for a sad kiss. They have concerned facial expressions. The background is blurred in bokeh, yet shows that the couple are high above Gotham city and explosions. Gotham is on heavily damages with flames burning high. The background shows gothic architecture with flames burning from the windows and spires. It is raining heavily. Heavy Rain droplets on their clothing reflect the light and environment. The scene is cast in blue and red orange hues.

The whole prompts are super long but the key findings for two person LoRAs, is that it helps to specify the people in the prompt. So I often describe myself with " a Black man Nate" and my wife as "a Filipina woman" so it doesn't blend us together in either gender or race.

For sampler settings, these were made with:

Euler
Simple
25 steps
CFG: 1
Denoise: 0.95

Then it gets passed through Ultimate SD node for upscaling at defaults except for CFG 8 changed to CFG 1 (flux only really works with CFG 1). Denoise at 0.25.

Afterwards the image is sent to the facedetailer node with bbox face_yolov8m.pt for the SEG model and person_yolov8m-seg.pt for SEGM_detector. For the SAM model we used sam_vit_b_01ec64.pth

UPDATE 11/8:

For the Supporter Exclusive ComfyUI workflow used, download it here: Flux Ultimate ControlNet + Upscale + Face Detailer V3

For the ComfyUI workflow used, download it here: Flux Ultimate Controlnet + Upscale (Speedboost) V2

***The facedetailer node was added later, and I'll upload that version this weekend once it's cleaned up. In the meantime, the linked workflow should have everything working.