🎨 Stable Diffusion
Setup
Test Runs
SD1.5 img2img
Using A1111, I transformed my drawing from this:
to this:
SDXL
I wanted to try out SDXL models.
The problem was I couldn't run the models in A1111 locally,
even for --medvram-sdxl
command line arg.
6GB VRAM isn't enough for SDXL.
While browsing the internet, I came across ComfyUI which was supposed to use less memory for SDXL than A1111. It did work, I could generate an image with SDXL models; admittedly it took a long time.
It was a big learning curve coming from GUI to node coding. I didn't have good experience with Low Code visual programming, so I began with heavy bias. Playing with ComfyUI didn't convince me that drag and drop coding was better. A lot of time, I wished it was normal programming. Just a few lines of code could replace multiple nodes and connections.
My first major attempt was to generate an image from an SDXL model, upscale latent by 2, then pass it to a SD1.5 model. In other word, the process was txt2img, latent upscale then img2img. Why did I do it? It was just to see how hard it was to set it up in ComfyUI.
It took a few days before I got something working. These are the fruit of my work:
Generated from an SDXL model:
Used an anime SD1.5 model to change the style of the first image:
Run the workflow again with different seed and a different anime SD1.5 model gave me this:
This is my messy but working workflow:
The images look interesting, but there are a lot of nonsensical details. For example, the clown doesn't know how to hold an object. If I improve my prompts, I wonder if it could be fixed. AI doesn't seem to be good at drawing hand at the moment.
Scribble
My drawing:
I used scribble model to transform it into an image:
Pose
I got a royalty free stock image and process it into a control image:
Applied the open pose model to generate a new image:
SDXL Turbo
This is faster than using regular SDXL, as it requires fewer sampling steps. The generated image from SDXL Turbo has low details, so it needs to be upscaled and processed via img2img.
Generated from a SDXL Turbo model:
After upscaling the image by 2, and process it with an SD1.5 model:
I tried SD1.5 LCM and the speed is fast, but the quality is too low.
Current Workflow
- Use SDXL to create an image.
- Upscale with Real-ESRGAN scaler, then downscale them to 2x original image.
- Pass image through img2img, using SD1.5 model.