Newer is not always better…

It’s been a while since I posted my post on Creating Character Images with Stable Diffusion and technology in the generative AI image creation space marches on. In the original posting, my go-to model was CyberRealistic_v33. Since that posting, both Stable Diffusion and the UI options for interacting with it have evolved. Most significantly, many models have been updated from the 1.5 model to what are termed XL models that have significantly more training data behind them and should produce hire-fidelity and more accurate images. Let’s take a look and see if that holds true.

My set-up

I’m running Stable Difussion on a Mac Pro with a M2 processor and Automatic1111 for the UI. In my original post, I was using version: v1.6.1  •  python: 3.10.13  •  torch: 2.0.1  •  xformers: N/A  •  gradio: 3.41.2  •  checkpoint: 7a4dbba12f. I have upgraded several components over the past few months and I’m now running version: v1.8.0-2-gb4d466bc  •  python: 3.11.4  •  torch: 2.1.0  •  xformers: N/A  •  gradio: 3.41.2  •  checkpoint: 44233ad4b7

The baseline

Here is the final image from my first posting under the old versions of the software and using the CyberRealistic v33 model:

Using the same prompt on the new versions of software, but still with the CyberRealistic v33 model produced:

Not bad, but you can see there are differences in what should be an exact reproduction. I may dig into the differences in a future posting, but for now, want to see how changing to a newer model performs.

I chose to start this experiment with iNiverse Mix XL(SFW & NSFW) since I liked the results it gave for some testing around using it to generate an image in my character template series . Here are the results:

The prompts used:

(full body) photograph of a young woman[Joanna Krupa:Summer Glau:0.6] standing under a blossoming cherry tree, stream flowing in background, athletic build, fit, slender waist, narrow hips, platinum blonde hair, french-twist hairstyle, big blue eyes, smiling, (wearing a short (red:1.5) beaded cocktail dress with black details), rim lighting, sunset, twilight, soft focus, dof <lora:epi_noiseoffset2:0.75> <lora:LowRA:0.4> (low key) <lora:add_detail:1>
Negative prompt: Asian-Less-Neg, CyberRealistic_Negative-neg Steps: 70, Sampler: DPM++ 2M Karras, CFG scale: 25.5, Seed: 1017757063, Face restoration: CodeFormer, Size: 512×512, Model hash: 44233ad4b7, Model: iniverseMixXLSFWNSFW_74Real, Variation seed: 2310841749, Variation seed strength: 0.31, Denoising strength: 0.65, ADetailer model: face_yolov8n.pt, ADetailer prompt: “<lora:lora_perfecteyes_v1_from_v1_160> (perfecteyes blue eyes),\n(flawless skin), beautiful face”, ADetailer confidence: 0.3, ADetailer dilate erode: 4, ADetailer mask blur: 4, ADetailer denoising strength: 0.4, ADetailer inpaint only masked: True, ADetailer inpaint padding: 32, ADetailer model 2nd: hand_yolov8n.pt, ADetailer confidence 2nd: 0.3, ADetailer dilate erode 2nd: 4, ADetailer mask blur 2nd: 4, ADetailer denoising strength 2nd: 0.4, ADetailer inpaint only masked 2nd: True, ADetailer inpaint padding 2nd: 32, ADetailer version: 24.4.2, Hires upscale: 2, Hires upscaler: R-ESRGAN 4x+, Lora hashes: “epi_noiseoffset2: d1131f7207d6, LowRA: 0dfc93870ba3, add_detail: 7c6bad76eb54”, Dynamic thresholding enabled: True, Mimic scale: 4, Separate Feature Channels: True, Scaling Startpoint: MEAN, Variability Measure: AD, Interpolate Phi: 0.93, Threshold percentile: 98.35, Mimic mode: Half Cosine Up, Mimic scale minimum: 4, CFG mode: Half Cosine Up, CFG scale minimum: 3.5, Downcast alphas_cumprod: True, Version: v1.8.0-2-gb4d466bc

Obviously, not what I was hoping for from a model that is more than 3x larger. Let’s see if a different XL model works better. I’ll switch to HalcyonSDXL which is supposed to give better photorealistic results. Additionaly, XL models recommend a minimum size of 768 x 768 pixels, so that is an easy change. They are also suppose to perform better without the CFG Scale fix, so I’ll disable that as well and set the CFG scale to 5.

Better, but it still is not what I was hoping for. Let’s see what other tweaks might help. The XL model series have different recommended sampling methods as well. For this model, the creator recommends DPM++ 3M SDE Karras. I’ll also bump the sampling steps up to 47. Let’s give it a try.

Still not giving me what I’m looking for. Let’s try pulling the Lora’s from the prompt that may be affecting the image:

hmmmm. It looks like the only significant change is some additional detail in her dress, and a less defined right hand. XL models are supposed to perform better with lower sampling steps, so let’s give that a try. Instead of 70 steps, let’s go down to 35 and see what happens.

It looks like the “black details” portion of the prompt has been picked up now, but little else has changed. One last change, lowering the CFG Scale. XL models seem to have a much lower CFG scale sweet spot than the baseline 1.5 models. Let’s drop the CFG scale to 3.5 and see if it makes a difference.

Our beading in the dress is back, but she’s either gained a leg, or has very strange anatomy. I’m not convinced this is a win for the image. Nudging the CFG scale up to 4 gives me:

The leg issue is cleared up with that change. Let’s keep these prompt settings and see how a couple other XL-based models perform.

First-up, RealCartoonXL_v6. Contrary to what the name implies, this model does a good job with photorealistic images as well. However, the results do look a little ‘cartoonish’ with our existing prompt.

Next, let’s see how EpicRealismXL_v5 performs.

How about EpicRealismXL_v7?

A note on the model page suggests removing the negative prompt for XL-based models, so let’s see what that does:

To bring this experiment full circle, here is the same prompt used above with my original model (CyberRealistic_v33).

Conclusion

Newer is not always better. From an aesthetic perspective, I still prefer the original image using a 1.5 model. However, the newer models and “higher fidelity” versions do simplify the prompting for initial results. I’m certain the updated models have capabilities and adjustments I’m not using, so please do test your own images and prompts to see what appeals to you. For me, I think the 1.5 generation of models are still faster and more appealing.


Follow me on AmazonGoodReads, or Facebook to get information about upcoming book releases.

Leave a comment