issues/8-progress.md

Phase 8 Progress Report

Phase 8 Goals

"Website Completion"

Phase 8 focuses on completing the website generation pipeline so that run.sh produces a fully deployable static website with all navigation working.

From Phase 7

  • Pipeline executes with zero warnings and errors
  • Output is clean, minimal, and informative
  • All paths displayed as relative paths
  • Validation statistics are accurate

Phase 8 Objectives

  • Integrate complete HTML generation into automated pipeline
  • Generate all similarity-sorted pages (similar/XXX.html)
  • Generate all diversity-sorted pages (different/XXX.html)
  • Rename "unique" to "different" for clarity
  • Ensure all navigation links are functional

Phase 8 Issues

Active Issues

IssueDescriptionStatusPriority
8-001Unified website generation pipelineIn ProgressHigh
8-002Implement multi-threaded HTML generationIn ProgressHigh
8-016Validate poem representation in paginationOpenMedium
8-020Hybrid pagination strategy (45GB constraint)In ProgressHigh
8-031Convert between similarity matrix formatsOpenLow
8-034Refactor to triangular individual files onlyOpenLow
8-058Eliminate main thread / worker code duplicationOpenMedium

Completed Issues

IssueDescriptionStatusCompleted
8-059Unify temp/ and tmp/ into a single tmpfs-symlinked tmp/Completed2026-06-15
8-057Boost visual formattingCompleted2026-01-29
8-045Timeline-based progress bar calculationCompleted2026-01-30
8-040Add images to similar/different pagesCompleted2026-01-28
8-011Scrape fediverse boost contentCompleted (279/458 scraped)2026-01-28
8-012Implement paginated similarity chaptersCompleted2026-01-21
8-038Center poem containers on pageCompleted2026-01-21
8-048Flatten media directory for deploymentCompleted2026-01-23
8-051Order poem index categories by ascending countCompleted2026-01-26
8-052Normalize vertical bar characters in HTML outputCompleted2026-01-26
8-053Add image title attribute and fix alt-text fallbackCompleted2026-01-26
8-049Implement audio and video playbackCompleted2026-01-28
8-054Extract image attachments from Matrix media messagesCompleted2026-01-28
8-041Escape HTML characters in poem contentCompleted2026-01-28
8-050Enhance word-cloud semantic similarity pagesCompleted2026-01-28
8-050eCentroid-based chronological link for word pagesCompleted2026-01-28
8-050cApply word color to word-page headerCompleted2026-01-28
8-050bColor-contextualized similarity rankingCompleted2026-01-28
8-050dConfigurable poems-per-word-page via run.shCompleted2026-01-28
8-050aCompute semantic color for word-cloud wordsCompleted2026-01-28
8-055Fix golden poem formatting on similar/different pagesCompleted2026-01-28
8-056Preserve whitespace in poem rendering (shared formatter)Completed2026-01-28
8-029Consolidate similarity matrix functionsCompleted2026-01-04
8-028Clean output directory of test/demo filesCompleted2026-01-04
8-027Implement extendable diversity cacheCompleted2026-01-04
8-026Add diversity computation progress displayCompleted2026-01-04
8-025Fix diversity cache validation order bugCompleted2026-01-04
8-024Improve similarity matrix progress displayCompleted2026-01-04
8-023Fix similarity matrix function call in run.shCompleted2026-01-04
8-022Add pagination CLI flags to HTML generationCompleted2026-01-04
8-021Fix embedding progress counter overcountingCompleted2026-01-04
8-019Implement unique poem_index systemCompleted2025-12-25
8-018Fix embedding directory case inconsistencyCompleted2025-12-25
8-030Add chronological anchor linksCompleted2026-01-09
8-003Remove remaining CSS from HTML generationCompleted2025-12-23 (reopened, re-completed)
8-004Implement embedding validation and empty poem handlingCompleted2025-12-14
8-006Fix golden poem box-drawing formatCompleted2025-12-15
8-007Add box-drawing borders around navigation linksCompleted2025-12-15
8-008Implement configurable centroid embedding systemCompleted2025-12-23
8-009Project cleanup and organizationCompleted2025-12-17
8-010Fix note filenames in generated HTMLCompleted2025-12-23
8-013Implement TXT export functionalityCompleted2025-12-23
8-015Implement ZIP extraction freshness checkCompleted2025-12-23
8-005Integrate images into HTML outputCompleted2025-12-23
8-035Colorize nav boxes according to progress barCompleted2026-01-21
8-036Add poem identification to ranking headersCompleted2026-01-21
8-039Move chronological files to subdirectoryCompleted2026-01-21
8-042Sync images from configurable directoriesCompleted2026-01-21
8-043Generate semantic word cloud page (full)Completed2026-01-21
8-046Create menu navigation pageCompleted2026-01-21
8-047Implement dark mode (always on)Completed2026-01-21
8-005Integrate images into HTML output (re-completed)Completed2026-01-21
8-030Add chronological anchor links (re-completed)Completed2026-01-21
8-037Fix similar/different box alignment (re-completed)Completed2026-01-21

8-047: Implement Dark Mode (Always On) - COMPLETED (2026-01-21)

  • ✅ Updated 17 <body> tags across 6 generator files
  • ✅ True black background (#000000), white text (#FFFFFF)
  • ✅ Accessible link colors (#6699FF, #9966FF)
  • ✅ CSS-free using HTML bgcolor/text/link/vlink attributes

8-046: Create Menu Navigation Page - COMPLETED (2026-01-21)

  • ✅ Changed header "How to explore" link to "Menu" → wordcloud.html
  • ✅ Added poem index section to wordcloud.html
  • ✅ Poems grouped by category (fediverse, notes, messages, bluesky)
  • ✅ Each poem links to its chronological position

8-043: Generate Semantic Word Cloud Page - RE-COMPLETED (2026-01-21)

  • ✅ Word cloud words now link to wordcloud/{word}.html similarity pages
  • ✅ Created src/generate-word-pages.lua for generating word similarity pages
  • ✅ Word embeddings cached to word_embeddings.json
  • ✅ Pages show top 50 most similar poems per word

8-005: Integrate Images into HTML Output - RE-COMPLETED (2026-01-21)

  • ✅ Fixed viewport overflow with style="max-width:100%; height:auto"
  • ✅ Pragmatic CSS exception approved for responsive images
  • ✅ Maintains aspect ratio, prevents horizontal scroll

8-030: Add Chronological Anchor Links - RE-COMPLETED (2026-01-21)

  • ✅ Fixed anchor ID format mismatch in effil worker
  • ✅ Chronological pages: id="poem-fediverse-4210" (full category)
  • ✅ Similar/different links: href="...#poem-fediverse-5000" (now matches)
  • ✅ Pagination-aware links working: chronological/05.html#poem-fediverse-5000

8-037: Fix Similar/Different Box Alignment - RE-COMPLETED (2026-01-21)

  • ✅ Fixed bottom progress bar off-by-one in effil worker
  • ✅ Corrected build_segment() call positions (1 to LEFT_JUNCTION, not 0)
  • ✅ Junction characters (, ) now align with box corners
  • ✅ Similar/different pages match chronological page formatting

8-039: Move Chronological Files to Subdirectory - RE-COMPLETED (2026-01-21)

  • ✅ Changed output from chronological-XX.html to chronological/XX.html
  • ✅ Updated generate_chronological_page_navigation() for relative paths
  • ✅ Updated redirect/index file to point to chronological/index.html
  • ✅ Updated similar/different page links to chronological/ subdirectory
  • ✅ Manually deleted 79 old chronological-*.html files from output root
  • ✅ Fixed internal pagination links (%s.html instead of chronological-%s.html)

8-042: Sync Images From Configurable Directories - COMPLETED (2026-01-21)

  • ✅ Added image_sync section to config/input-sources.json
  • ✅ Added sync_images_from_config() function to scripts/update-words
  • ✅ Uses jq for JSON parsing, rsync for file syncing
  • ✅ Supports multiple source directories with preserve_structure option
  • ✅ Reports per-source and total sync statistics

8-043: Generate Semantic Word Cloud Page (MVP) - COMPLETED (2026-01-21)

  • ✅ Created config/stop-words.txt with 271 categorized stop words
  • ✅ Created src/wordcloud-generator.lua with frequency-based sizing
  • ✅ Added word_cloud section to config/input-sources.json
  • ✅ Updated Issue 10-003 with vimfolded config sections
  • ✅ 200 words displayed from 23,455 unique (after filtering)
  • ⏳ Future: embedding-based semantic weighting, word similarity pages

8-035: Colorize Nav Boxes According to Progress Bar - COMPLETED (2026-01-21)

  • ✅ Added colorize_char() helper function for color wrapping
  • ✅ Modified generate_regular_corner_box_top() with progressive colorization
  • ✅ Modified generate_regular_corner_box_nav_line() with wall colorization
  • ✅ Updated effil worker thread with equivalent inline functions
  • ✅ Left box (positions 0-10) colorizes as progress reaches them
  • ✅ Right box (positions 70-82) colorizes when progress passes 70
  • ✅ Uses poem's semantic color (red, blue, green, purple, orange, yellow, gray)

8-036: Add Poem Identification to Ranking Headers - COMPLETED (2026-01-21)

  • ✅ Added get_source_path() helper to effil worker thread
  • ✅ Ranking headers now show: --- #N category/identifier ---
  • ✅ Notes: notes/source_file (e.g., "notes/what-a-lame-movie")
  • ✅ Bluesky: bluesky#N (e.g., "bluesky#42")
  • ✅ Fediverse: fediverse/N (e.g., "fediverse/1234")
  • ✅ Messages: messages/N (e.g., "messages/567")

8-030: Add Chronological Anchor Links - COMPLETED (2026-01-09)

  • ✅ Created get_poem_anchor_id() helper function
  • ✅ Updated navigation functions for three-part layout
  • ✅ Added chronological links centered between similar/different
  • ✅ Added HTML anchor IDs to chronological.html
  • ✅ Layout: │ similar │ chronological │ different │
  • ✅ Test: 100 chronological links found in generated pages
  • ✅ Format: chronological.html#poem-{category}-{id}

Issue Details

8-001: Unified Website Generation Pipeline - IN PROGRESS (Phases 1-6 complete)

  • ✅ Phase 1: HTML Integration - complete
  • ✅ Phase 2: Parallel HTML Generation - complete
  • ✅ Phase 3: Embedding Integration - complete (2025-12-25)
  • --generate-embeddings flag with freshness checks
  • --model option for model selection
  • All 7,844 poems have embeddings (100% complete as of 2026-01-04)
  • ✅ Phase 4: Similarity Matrix Integration - complete (2025-12-25)
  • --generate-similarity flag with dependency validation
  • Run ./scripts/validate-pipeline-data --quick to check current completion
  • ✅ Phase 5: Diversity Cache Integration - complete (2025-12-25)
  • --generate-diversity flag with freshness checks
  • Extendable cache with incremental saves implemented (8-027)
  • ✅ Phase 6: Pipeline Orchestration - complete (2025-12-25)
  • --full runs all 10 stages, --all runs fast stages (1-5, 9-10)
  • TUI updated with new stages (marked ⚠️ for expensive ones)
  • Pending: Full HTML regeneration with boost formatting (8-057)
  • Run ./run.sh --generate-html after merging boost formatting changes

8-002: Implement Multi-threaded HTML Generation - IN PROGRESS

  • ✅ Created scripts/generate-html-parallel using effil library
  • ✅ Similarity page generation working (10 pages/sec with 4 threads)
  • ✅ Batch-based thread pool with progress reporting
  • ✅ Difference page generation working (centroid-based diversity algorithm)
  • ✅ 62MB embeddings loaded and shared via effil.table
  • ✅ Option C optimization: scripts/precompute-diversity-sequences created
  • ✅ Thermal management with configurable sleep between batches
  • ✅ Cache-based fast path in generate-html-parallel
  • Pending: Run pre-computation (~42 hours), pipeline integration

8-003: Remove Remaining CSS from HTML Generation - COMPLETED (2025-12-23 re-completed)

  • ✅ Removed 3 <style> blocks from templates (Phase 1)
  • ✅ Replaced inline style= with <font color=""><b> tags (Phase 1)
  • ✅ Removed container div inline styles (Phase 1)
  • ✅ Verified: 0 style attributes, 15,576 font color tags in test output (Phase 1)
  • ✅ [PHASE 2] Removed remaining 4 style= attributes missed in Phase 1:
  • Image tags: style="max-width:100%%; height:auto;" (2 occurrences)
  • Pre tags: style="text-align: left; max-width: 90ch; margin: 0 auto;" (2 occurrences)
  • ✅ [PHASE 2] Templates now use plain <pre> tag without CSS
  • ✅ [PHASE 2] Verified: 0 style attributes in generated HTML

8-004: Implement Embedding Validation and Empty Poem Handling - COMPLETED

  • ✅ Empty poems now get random embeddings (seeded by poem ID for reproducibility)
  • ✅ Random embeddings normalized to unit vectors
  • ✅ Added is_random = true flag to identify synthetic embeddings
  • ✅ Pre-flight validation in scripts/generate-html-parallel
  • ✅ Pre-flight validation in scripts/precompute-diversity-sequences
  • ✅ Scripts exit with helpful error if poems with content lack embeddings

8-005: Integrate Images into HTML Output - OPEN

  • Image catalog exists (539 images in assets/image-catalog.json)
  • flat-html-generator.lua does not consume image catalog
  • Need to associate images with poems via source metadata
  • Render <img> tags in poem HTML output

8-006: Fix Golden Poem Box-Drawing Format - COMPLETED

  • ✅ Rewrote apply_golden_poem_formatting() with proper 84-char box
  • ✅ Integrated progress bar colors into golden corners (╔═─┐ / ╚═─┘)
  • ✅ Added side borders with padding (║ content │) to each content line
  • ✅ Updated generate_progress_dashes() to 82-char border width
  • ✅ Fixed line splitting and text wrapping to preserve paragraph breaks
  • ✅ 244 golden poems now render correctly with 80-char content area

8-007: Add Box-Drawing Borders Around Navigation Links - COMPLETED

  • ✅ Added corner box separator line: ╟─────────┐ + gap + ┌───────────┤
  • ✅ Added corner box navigation line with vertical walls
  • ✅ Bottom border junctions adapt to progress: for ═ section, for ─ section
  • ✅ Regular (non-golden) poems now have corner boxes connecting to progress bar
  • ✅ Corner characters: (left) and (right) close regular poem corner boxes

8-013: Implement TXT Export Functionality - COMPLETED (2025-12-23)

  • render_attachment_images_txt() for [Image: alt-text] format
  • strip_html_tags() for removing HTML and decoding entities
  • generate_txt_file_header() for consistent file headers
  • generate_similarity_txt_file() with headers
  • generate_diversity_txt_file() with headers
  • M.generate_chronological_txt_file() created and integrated
  • ✅ Pipeline integration (regenerate-clean-site.lua, main.lua)
  • Note: Download links in HTML pages moved to 8-012 scope

8-012: Implement Paginated Similarity Chapters - IN PROGRESS (Phases A+B+C+D complete)

  • ✅ Circular dependency with 8-013 resolved
  • ✅ Added pagination config to config/input-sources.json
  • ✅ Documented minimum_pages setting requirement
  • ✅ Phase A: Core pagination logic implemented (10 new functions)
  • ✅ Phase B: Prev/next navigation implemented
  • ✅ Test: 134KB page with 100 poems, proper navigation
  • ✅ Phase C: Export Format Integration (2026-01-09)
  • Download links (.txt and .html archive) in every paginated page
  • Created generate_similarity_html_archive() and generate_diversity_html_archive()
  • Integrated into M.generate_flat_html_with_similarity_and_diversity()
  • Test: All links present and correctly formatted
  • Phase D: Generation Strategy - COMPLETED (2026-01-11)
  • Created parse_pages_specification() for --pages flag parsing ("1", "all", "1-10")
  • Integrated pagination into M.generate_complete_flat_html_collection()
  • Updated CLI argument parsing (utils.parse_cli_args)
  • Modified main.lua to pass pages parameter through pipeline
  • Test: Pagination generates correct filenames (similar/0001-01.html format)
  • Files use poem_index (numeric) for consistent naming
  • Default: generates page 1 only (minimum_pages=1)
  • Pending: Phase E - Integration (entry points, testing, max_pages enforcement)
  • ✅ Phase E Step 16 - Chronological stays as single file (per 8-020)
  • Related: 8-016 (validator) depends on this issue
  • Modified by 8-020: Hybrid pagination strategy

8-020: Hybrid Pagination Strategy - OPEN

  • Storage constraint: 45 GB Neocities limit
  • Full chronological.html: All 7,793 poems (~12 MB) - NOT paginated
  • Paginated similar/different: Max 15 pages per poem = 1,500 poems per direction
  • Storage budget: ~38 GB used of 45 GB available
  • Reserved: ~31 MB for Phase 11 maze pages
  • Modifies 8-012, 8-016 validation scope

8-016: Validate Poem Representation in Pagination - OPEN

  • Depends on 8-012 completion
  • Post-generation validator to ensure all poems appear in output
  • Optional --fix flag to regenerate missing pages
  • Pipeline integration for deployment confidence

8-008: Implement Configurable Centroid Embedding System - COMPLETED

  • ✅ Created assets/centroids.json config with 5 example moods (melancholy, wonder, rage, tenderness, absurdity)
  • ✅ Implemented src/centroid-generator.lua for embedding generation via Ollama
  • ✅ Implemented recursive chunking algorithm for long content (not triggered for keyword-only centroids)
  • ✅ Created src/centroid-html-generator.lua for HTML page generation
  • ✅ Generated 11 files: index.html + 5 similar + 5 different pages in output/centroid/
  • ✅ Similarity scores verified working (0.72-0.78 range for top matches)
  • Use cases: themed entry points, mood-based exploration, curated collections

8-010: Fix Note Filenames in Generated HTML - COMPLETED

  • ✅ Created get_poem_display_filename() helper function
  • ✅ Notes now display original descriptive filenames (e.g., notes/what-a-lame-movie)
  • ✅ Fediverse/messages display numeric ID without .txt extension
  • ✅ Updated all 4 file header generation locations
  • ✅ Verified: notes show source_file, no .txt extensions anywhere

8-056: Preserve Whitespace in Poem Rendering - COMPLETED (2026-01-28)

  • ✅ Created libs/text-formatter.lua shared module for whitespace preservation
  • ✅ Updated main thread to preserve whitespace for ALL categories (not just notes)
  • ✅ Updated worker thread (effil) to use shared module instead of %S+ word-splitting
  • ✅ Both code paths now use identical formatting logic via the shared module
  • ✅ Old wrap_text_80_chars() retained for UI text (CW boxes, help pages, TXT export)
  • Architectural fix: eliminates divergent code paths that caused the bug

8-055: Fix Golden Poem Formatting - COMPLETED (2026-01-28)

  • ✅ Bug 1: HTML entity padding - now uses text_formatter.calculate_visible_width()
  • ✅ Bug 2: Junction positions - changed GOLDEN_LEFT_JUNCTION_POS from 9→10, GOLDEN_RIGHT_JUNCTION_POS from 70→71
  • ✅ Bug 3: Worker thread now receives layout config via thread_config.layout
  • Golden poems now render identically on chronological and similar/different pages

8-050d: Configurable Poems-per-Word-Page - COMPLETED (2026-01-28)

  • ✅ Added --wordcloud-poems N CLI flag to run.sh
  • ✅ Added TUI menu item in "Word Cloud Options" section (hotkey 'p')
  • ✅ Updated src/generate-word-pages.lua to parse --poems-per-page argument
  • ✅ Configuration precedence: CLI > config.lua > default (50)

8-050b: Color-Contextualized Similarity Ranking - COMPLETED (2026-01-28)

  • ✅ Added balanced_color_select() function using cumulative-similarity round-robin
  • ✅ Pre-filters to top K candidates (K = N×7) for semantic relevance
  • ✅ Groups candidates into 7 color buckets, picks from lowest cumulative total
  • ✅ Falls back to pure similarity ranking if color embeddings unavailable
  • Result: Each word page now shows poems from all 7 semantic colors, not just the dominant one

8-050c: Apply Word Color to Word-Page Header - COMPLETED (2026-01-28)

  • ✅ Implemented Option 3 (Hybrid): word color for header, per-poem colors for bars
  • ✅ Added word_hex_color parameter to generate_word_page()
  • ✅ Header word now rendered with semantic color: <font color="...">word</font>
  • Result: "silence" shows in blue, "fire" in red, while bars show color diversity from 8-050b

8-050e: Centroid-Based Chronological Link - COMPLETED (2026-01-28)

  • ✅ Added compute_centroid() and find_closest_poem_to_centroid() helper functions
  • ✅ Built chrono_page_map to map poem_index → chronological page number
  • ✅ Header now includes Main, Word Cloud, and centroid-targeted Chronological links
  • ✅ Chronological link points to the poem closest to the word's semantic center

8-050: Enhance Word-Cloud Semantic Similarity Pages - COMPLETED (2026-01-28)
All 5 sub-issues completed:

  • 8-050a: Word color computation ✓
  • 8-050b: Balanced color selection algorithm ✓
  • 8-050c: Word color in header ✓
  • 8-050d: Configurable poems-per-page ✓
  • 8-050e: Centroid-based chronological navigation ✓

Completion Criteria

  • [ ] run.sh generates complete website without manual intervention
  • [ ] All poem IDs have corresponding similar/XXX.html files
  • [ ] All poem IDs have corresponding different/XXX.html files
  • [ ] Navigation links use "different" (not "unique")
  • [ ] chronological.html links point to correct files
  • [ ] Generation completes in reasonable time

Phase Status: OPEN

Started: 2025-12-14