🎨 Stable Diffusion

Setup

A1111
ComfyUI
Models
RTX 3060 6GB VRAM and 16GB RAM laptop

Test Runs

SD1.5 img2img

Using A1111, I transformed my drawing from this:

to this:

SD generated

SDXL

I wanted to try out SDXL models. The problem was I couldn't run the models in A1111 locally, even for --medvram-sdxl command line arg. 6GB VRAM isn't enough for SDXL.

While browsing the internet, I came across ComfyUI which was supposed to use less memory for SDXL than A1111. It did work, I could generate an image with SDXL models; admittedly it took a long time.

It was a big learning curve coming from GUI to node coding. I didn't have good experience with Low Code visual programming, so I began with heavy bias. Playing with ComfyUI didn't convince me that drag and drop coding was better. A lot of time, I wished it was normal programming. Just a few lines of code could replace multiple nodes and connections.

My first major attempt was to generate an image from an SDXL model, upscale latent by 2, then pass it to a SD1.5 model. In other word, the process was txt2img, latent upscale then img2img. Why did I do it? It was just to see how hard it was to set it up in ComfyUI.

It took a few days before I got something working. These are the fruit of my work:

Generated from an SDXL model:

SDXL generated img

Used an anime SD1.5 model to change the style of the first image:

SD1.5 img2img

Run the workflow again with different seed and a different anime SD1.5 model gave me this:

another SD1.5 img2img

This is my messy but working workflow:

ComfyUI workflow

The images look interesting, but there are a lot of nonsensical details. For example, the clown doesn't know how to hold an object. If I improve my prompts, I wonder if it could be fixed. AI doesn't seem to be good at drawing hand at the moment.

Scribble

My drawing:

my drawing

I used scribble model to transform it into an image:

from drawing to image

Pose

I got a royalty free stock image and process it into a control image:

pose

Applied the open pose model to generate a new image:

character with new pose

SDXL Turbo

This is faster than using regular SDXL, as it requires fewer sampling steps. The generated image from SDXL Turbo has low details, so it needs to be upscaled and processed via img2img.

Generated from a SDXL Turbo model:

SDXL Turbo imgage

After upscaling the image by 2, and process it with an SD1.5 model:

postprocessed image

I tried SD1.5 LCM and the speed is fast, but the quality is too low.

Current Workflow

Use SDXL to create an image.
Upscale with Real-ESRGAN scaler, then downscale them to 2x original image.
Pass image through img2img, using SD1.5 model.

Setup​

Test Runs​

SD1.5 img2img​

SDXL​

Scribble​

Pose​

SDXL Turbo​

Current Workflow​

Setup

Test Runs

SD1.5 img2img

SDXL

Scribble

Pose

SDXL Turbo

Current Workflow