issues/13-003b-implement-stable-diffusion-api-integration.md

Issue 13-003b: Implement Stable Diffusion API Integration

Priority

High (blocks 13-003c)

Parent Issue

13-003: Generate Stable Diffusion Visuals from Flopsopoly Sequence

Current Behavior

The project integrates with Ollama for embeddings via HTTP API (libs/ollama-config.lua). No stable diffusion integration exists. The user has a locally-hosted stable diffusion instance available at a configurable IP:port.

Intended Behavior

Implement a Lua wrapper module for calling a local stable diffusion API (Automatic1111/SDAPI or ComfyUI format) to generate images from text prompts.

API Patterns

Automatic1111 Web UI API (most common):

POST http://IP:PORT/sdapi/v1/txt2img
Content-Type: application/json

{
    "prompt": "silence fire memory ocean dream",
    "negative_prompt": "text, watermark, blurry",
    "width": 1024,
    "height": 1024,
    "steps": 20,
    "cfg_scale": 7.0,
    "sampler_name": "euler_a"
}

Response:
{
    "images": ["base64_encoded_png_data"],
    "parameters": {...},
    "info": "..."
}

ComfyUI API (alternative):

POST http://IP:PORT/prompt
{
    "prompt": { ... workflow JSON ... }
}

For this issue, focus on Automatic1111 API as the primary target. ComfyUI support can be a future enhancement.

Lua Wrapper Interface

-- libs/stable-diffusion.lua

-- {{{ sd.init
-- Initialize stable diffusion client with configuration
-- @param config: table with endpoint, model settings
-- @return boolean success, string error_message
local function init(config)
end
-- }}}

-- {{{ sd.txt2img
-- Generate image from text prompt
-- @param prompt: string prompt for image generation
-- @param output_path: path to save generated image
-- @param options: optional overrides for width, height, steps, etc.
-- @return boolean success, string error_message
local function txt2img(prompt, output_path, options)
end
-- }}}

-- {{{ sd.img2img
-- Generate image from text prompt + input image (for multi-pass)
-- @param prompt: string prompt for image generation
-- @param init_image_path: path to input image
-- @param output_path: path to save generated image
-- @param options: denoising_strength, etc.
-- @return boolean success, string error_message
local function img2img(prompt, init_image_path, output_path, options)
end
-- }}}

-- {{{ sd.check_connection
-- Verify stable diffusion API is reachable
-- @return boolean connected, string error_message
local function check_connection()
end
-- }}}

Configuration

-- In config.lua:
stable_diffusion = {
    -- Connection (required)
    endpoint = "",  -- Must be configured: "http://192.168.0.115:7860"

    -- Model settings
    width = 1024,
    height = 1024,
    steps = 20,
    cfg_scale = 7.0,
    sampler = "euler_a",
    negative_prompt = "text, watermark, blurry, low quality, deformed",

    -- Timeouts
    timeout_seconds = 120,  -- Per-image generation timeout
    retry_on_timeout = true,
    max_retries = 2,
}

Technical Design

HTTP Request via curl

Follow the Ollama integration pattern using curl:

-- {{{ local function call_txt2img_api
local function call_txt2img_api(prompt, config)
    local payload = {
        prompt = prompt,
        negative_prompt = config.negative_prompt or "",
        width = config.width or 1024,
        height = config.height or 1024,
        steps = config.steps or 20,
        cfg_scale = config.cfg_scale or 7.0,
        sampler_name = config.sampler or "euler_a",
    }

    local payload_json = dkjson.encode(payload)
    local temp_request = DIR .. "/tmp/sd_request.json"
    local temp_response = DIR .. "/tmp/sd_response.json"

    utils.write_file(temp_request, payload_json)

    local cmd = string.format(
        'curl -s -X POST "%s/sdapi/v1/txt2img" ' ..
        '-H "Content-Type: application/json" ' ..
        '-d @"%s" ' ..
        '--max-time %d ' ..
        '-o "%s"',
        config.endpoint,
        temp_request,
        config.timeout_seconds or 120,
        temp_response
    )

    local exit_code = os.execute(cmd)
    if exit_code ~= 0 then
        return nil, "curl failed with exit code: " .. tostring(exit_code)
    end

    local response_text = utils.read_file(temp_response)
    if not response_text then
        return nil, "Failed to read response file"
    end

    local response = dkjson.decode(response_text)
    if not response or not response.images or not response.images[1] then
        return nil, "Invalid response: no images returned"
    end

    return response.images[1]  -- Base64 encoded image
end
-- }}}

Base64 Image Decoding

Stable diffusion returns images as base64-encoded PNG. Decode and save:

-- {{{ local function save_base64_image
local function save_base64_image(base64_data, output_path)
    -- Use base64 command-line tool for decoding
    local temp_b64 = DIR .. "/tmp/image.b64"
    utils.write_file(temp_b64, base64_data)

    local cmd = string.format(
        'base64 -d "%s" > "%s"',
        temp_b64, output_path
    )

    local exit_code = os.execute(cmd)
    return exit_code == 0
end
-- }}}

Connection Check

-- {{{ local function check_connection
local function check_connection(config)
    local cmd = string.format(
        'curl -s -o /dev/null -w "%%{http_code}" "%s/sdapi/v1/options" --max-time 5',
        config.endpoint
    )

    local handle = io.popen(cmd)
    local status_code = handle:read("*a")
    handle:close()

    if status_code == "200" then
        return true, nil
    else
        return false, "API returned status: " .. status_code
    end
end
-- }}}

Suggested Implementation Steps

Create libs/stable-diffusion.lua — Module skeleton with vimfolds
Implement check_connection() — Verify API reachability
Implement txt2img(prompt, output_path, options) — Core generation
Implement img2img(prompt, init_image_path, output_path, options) — For multi-pass (13-003d)
Add base64 decoding — Save returned images as PNG files
Add retry logic — Handle timeouts and transient failures
Add configuration schema — stable_diffusion section in config.lua
Create test script — Generate a single test image
Document in libs/stable-diffusion.info.md

Deliverables

[ ] libs/stable-diffusion.lua — API wrapper module
[ ] libs/stable-diffusion.info.md — Interface documentation
[ ] check_connection() — API health check
[ ] txt2img() — Text-to-image generation
[ ] img2img() — Image-to-image generation (for 13-003d)
[ ] Base64 decoding and image saving
[ ] Retry logic for timeouts
[ ] Configuration schema in config.lua
[ ] Test script: scripts/test-stable-diffusion.sh

Testing

-- Test: connection check
local connected, err = sd.check_connection()
assert(connected, "Failed to connect: " .. (err or "unknown"))

-- Test: generate single image
local success, err = sd.txt2img(
    "a peaceful sunset over mountains, digital art",
    "tmp/test_image.png",
    {steps = 10}  -- Faster for testing
)
assert(success, "txt2img failed: " .. (err or "unknown"))
assert(file_exists("tmp/test_image.png"), "Output image not created")

-- Test: verify image is valid PNG
local handle = io.popen('file tmp/test_image.png')
local file_type = handle:read("*a")
handle:close()
assert(file_type:find("PNG"), "Output is not a valid PNG")

Error Handling

Error	Behavior
Endpoint not configured	Error immediately: "stable_diffusion.endpoint not configured"
API unreachable	Error with connection details, suggest checking IP:port
Timeout	Retry once (if configured), then error with timeout duration
Invalid response	Error with response snippet for debugging
Base64 decode failure	Error with decode command output

Edge Cases

Endpoint without protocol: Auto-prepend "http://" if missing
Trailing slash: Normalize endpoint URL
Very large images: Warn if width/height > 2048 (slow generation)
Empty prompt: Error immediately (SD may hang or error)

Metadata

Status: Open
Created: 2026-01-28
Phase: 13 (Audio-Visual Generation)
Estimated Complexity: Medium (HTTP API integration)
Dependencies: Local stable diffusion instance running
Blocks: 13-003c, 13-003d