Stable Diffusion

I tried SD 9 months ago, and since then it has gotten a lot easier to set up and generate decent images.

Setup

A1111
ComfyUI
Models
RTX 3060 6GB VRAM and 16GB RAM laptop

Test Runs

SD1.5 img2img

Using A1111, I transformed my drawing from this:

to this: SD generated

SDXL

I wanted to try out SDXL models. The problem was I couldn’t run the models in A1111 locally, even if --medvram-sdxl command line arg. 6GB VRAM isn’t enough for SDXL.

I could use some paid cloud services (free ones usually have a long queue and not many supported SDXL back then), which I will do in the future.

While browsing the internet, I came across ComfyUI which was said to use less memory for SDXL than A1111. It did work, I could generate an image with SDXL models; admittedly it took a long time.

It was a big learning curve coming from GUI to node coding. I didn’t have good experience with Low Code visual programming, so I began with heavy bias. Playing with ComfyUI still didn’t convince me that drag and drop coding was better. A lot of time, I wished it was normal programming. Just a few lines of code could replace multiple nodes and connections.

I didn’t know what I was doing, and I’m still learning now. My first major attempt was to generate an image from an SDXL model, upscale latent by 2, then pass it to a SD1.5 model. In other word, the process was txt2img, latent upscale then img2img. Why did I do it? It was just to see how hard it was to set it up in ComfyUI.

It took a few days before I got something working. These are the fruit of my work:

Generated from an SDXL model: SDXL generated img

Used an anime SD1.5 model to change the style of the first image. SD1.5 img2img

Run the workflow again with different seed and a different anime SD1.5 model gave me this: another SD1.5 img2img

This is my messy but working workflow: ComfyUI workflow

The images look interesting, but there are a lot of nonsensical details. For example, the clown doesn’t know how to hold an object. If I improve my prompts, I wonder if it could be fixed. AI doesn’t seem to be good at drawing hand at the moment.

ControlNet

Scribble

My drawing:

I used scribble model to transform it into an image: from drawing to image

Pose

Got a royalty free stock image and process it into a control image: pose

Applied the open pose model to generate a new image: character with new pose

SDXL Turbo

This is faster than using regular SDXL, as it requires lower sampling steps. The generated image from SDXL Turbo has low details, so it needs to be upscaled and processed via img2img.

Generated from a SDXL Turbo model: SDXL Turbo imgage

After upscaling the image by 2, and process it with an SD1.5 model: postprocessed image

I tried SD1.5 LCM and the speed is fast, but the quality is also low.

Current Workflow

Use SDXL to create an image.
Upscale with AI upscalers (Real-ESRGAN), then downscale them to 2x original image.
Pass image through img2img, using SD1.5 model.

TODO

Animation
Learn how to fix hands in GIMP or Krita
SDXL pony
SD3

References

Guide to make better images

⚙️Thought Machine⚙️

Explorer