issues/13-004c-implement-transition-effects.md
Issue 13-004c: Implement Transition Effects (Post-MVP)
Priority
Low (enhancement, blocked by MVP)
Parent Issue
13-004: Assemble Video from TTS Audio and Generated Images
Blocked By
13-004b: Implement ffmpeg Video Assembly (MVP Sharp Cuts)
Note: This issue should only be started after 13-004b (MVP with sharp cuts) is complete and validated. The architecture should be designed with transitions in mind, but implementation is deferred.
Current Behavior
After 13-004b completes, video assembly works with sharp cuts — each frame displays for its duration, then instantly switches to the next frame. This is functional but visually abrupt.
Intended Behavior
Add optional transition effects between frames:
- Crossfade: Smooth alpha blend between adjacent frames
- Dissolve: Fade out → black → fade in
- Variable timing: Longer transitions for semantically distant words
Transition Types
1. Crossfade
Smooth blend between frame A and frame B over a configurable duration:
Time: |--A 100%--|--blend--|--B 100%--|
Alpha A: 1.0 -----> 0.5 ----> 0.0
Alpha B: 0.0 -----> 0.5 ----> 1.0
ffmpeg approach:
# Using xfade filter
ffmpeg -i frame_A.png -i frame_B.png \
-filter_complex "xfade=transition=fade:duration=0.2:offset=0.8" \
output.mp4
For full sequence, this becomes complex — may need to generate intermediate blend frames or use a more sophisticated approach.
2. Dissolve Through Black
Frame A fades to black, then frame B fades in from black:
Time: |--A 100%--|--fade out--|--black--|--fade in--|--B 100%--|
ffmpeg approach:
# Fade out A, then fade in B
ffmpeg -i frame_A.png -vf "fade=t=out:st=0.8:d=0.2" -t 1 a_fadeout.mp4
ffmpeg -i frame_B.png -vf "fade=t=in:st=0:d=0.2" -t 1 b_fadein.mp4
3. Variable Timing Based on Semantic Distance
Adjust transition duration based on how semantically different adjacent frames are:
- Very similar words → quick transition (100ms)
- Very different words → slow transition (500ms)
local function calculate_transition_duration(frame_a, frame_b, config)
local similarity = cosine_similarity(
frame_a.center_embedding,
frame_b.center_embedding
)
-- Map similarity to duration
-- similarity 1.0 (identical) → min_duration
-- similarity 0.0 (orthogonal) → max_duration
local min_ms = config.min_transition_ms or 100
local max_ms = config.max_transition_ms or 500
return min_ms + (1 - similarity) * (max_ms - min_ms)
end
Technical Design
Configuration
-- In config.lua:
trance_video = {
-- ... base settings from 13-004b ...
-- Transition settings
transition = "sharp", -- "sharp" (MVP), "crossfade", "dissolve"
transition_duration_ms = 200, -- Fixed duration for crossfade/dissolve
-- Variable transition (optional)
variable_transitions = false,
min_transition_ms = 100,
max_transition_ms = 500,
}
Implementation Approaches
Approach A: Pre-render Blend Frames
Generate intermediate frames for transitions before assembly:
-- {{{ local function generate_blend_frames
local function generate_blend_frames(frame_a_path, frame_b_path, output_dir, num_steps)
-- Use ImageMagick to generate blend sequence
for i = 1, num_steps do
local alpha = i / (num_steps + 1)
local output_path = string.format("%s/blend_%03d.png", output_dir, i)
local cmd = string.format(
'convert "%s" "%s" -compose blend -define compose:args=%d,%d -composite "%s"',
frame_a_path, frame_b_path,
math.floor((1 - alpha) * 100), math.floor(alpha * 100),
output_path
)
os.execute(cmd)
end
end
-- }}}
Update concat.txt to include blend frames with short durations.
Approach B: ffmpeg Filter Complex
Use ffmpeg's xfade filter for transitions (complex for long sequences):
# For 3 frames with crossfade:
ffmpeg -loop 1 -t 1 -i frame1.png \
-loop 1 -t 1 -i frame2.png \
-loop 1 -t 1 -i frame3.png \
-filter_complex \
"[0][1]xfade=transition=fade:duration=0.2:offset=0.8[v1]; \
[v1][2]xfade=transition=fade:duration=0.2:offset=1.6[v2]" \
-map "[v2]" output.mp4
This approach becomes unwieldy for hundreds of frames.
Approach C: Video Editing Library
Use a video editing library (e.g., MoviePy for Python) that handles transitions natively:
from moviepy.editor import *
clips = [ImageClip(f).set_duration(d) for f, d in frames_with_durations]
video = concatenate_videoclips(clips, method="compose", transition=crossfadein(0.2))
video.write_videofile("output.mp4")
This requires a Python dependency but may be cleaner for complex transitions.
Recommended Approach
Approach A (pre-render blend frames) is recommended because:
- Stays within the Lua + shell ecosystem
- Works with existing ffmpeg concat workflow
- Enables variable transition durations
- ImageMagick is likely already installed
Suggested Implementation Steps
- Verify 13-004b MVP works — Don't break sharp cuts
- Add transition config — New settings in
config.lua - Implement blend frame generation — ImageMagick composite
- Update concat file generation — Include blend frames with durations
- Implement variable timing — Calculate from semantic distance
- Add CLI flags —
--video-transition,--transition-duration - Test all transition types — Verify smooth playback
- Benchmark — Measure additional processing time
Deliverables
- [ ] Transition configuration schema in
config.lua - [ ] Crossfade implementation (pre-rendered blend frames)
- [ ] Dissolve implementation (fade through black)
- [ ] Variable transition timing based on semantic distance
- [ ] Updated concat file generation for transitions
- [ ] CLI flags:
--video-transition,--transition-duration - [ ] Documentation of transition types and tradeoffs
Testing
# Test crossfade
./run.sh --trance-video --video-transition crossfade --transition-duration 200
# Test dissolve
./run.sh --trance-video --video-transition dissolve
# Test variable timing
./run.sh --trance-video --video-transition crossfade --variable-transitions
# Compare file sizes
ls -la output/flopsopoly/trance-video-*.mp4
Performance Notes
Transitions add processing time:
- Pre-rendered blend frames: +2-5 frames per transition × N transitions
- For 700 transitions with 5 blend frames each: 3,500 additional frames
- ImageMagick blend: ~0.1-0.5s per frame
- Total additional time: 6-30 minutes
Consider:
- Reducing blend frame count (3 instead of 5)
- Parallel blend generation
- Caching blend frames
Edge Cases
- ImageMagick not installed: Error with install instructions, fallback to sharp
- Variable transitions with no embeddings: Fallback to fixed duration
- Very short frame duration: Skip transition if frame < transition duration
- First/last frame: No transition before first or after last
Future Enhancements
These could be additional sub-issues if needed:
- Ken Burns effect: Slow zoom/pan on each frame
- Morph transitions: Use img2img to generate intermediate frames
- Audio-reactive transitions: Sync transition timing to audio features
- Subtitle overlay: Display word text synchronized with audio
Related Documents
- Issue 13-004: Assemble Video (parent)
- Issue 13-004b: Implement ffmpeg Video Assembly (MVP, must complete first)
- Issue 13-003c: Implement Single-Pass Image Generation Pipeline (provides frames)
assets/embeddings/embeddinggemma_latest/word_embeddings.json— For semantic distance
Metadata
- Status: Open (blocked by 13-004b)
- Created: 2026-01-28
- Phase: 13 (Audio-Visual Generation)
- Estimated Complexity: Medium-High (video processing + semantic integration)
- Dependencies: 13-004b (MVP must work first)
- Blocks: None (enhancement)