issues/8-progress.md
Phase 8 Progress Report
Phase 8 Goals
"Website Completion"
Phase 8 focuses on completing the website generation pipeline so that run.sh produces a fully deployable static website with all navigation working.
From Phase 7
- Pipeline executes with zero warnings and errors
- Output is clean, minimal, and informative
- All paths displayed as relative paths
- Validation statistics are accurate
Phase 8 Objectives
- Integrate complete HTML generation into automated pipeline
- Generate all similarity-sorted pages (similar/XXX.html)
- Generate all diversity-sorted pages (different/XXX.html)
- Rename "unique" to "different" for clarity
- Ensure all navigation links are functional
Phase 8 Issues
Active Issues
| Issue | Description | Status | Priority |
|---|---|---|---|
| 8-001 | Unified website generation pipeline | In Progress | High |
| 8-002 | Implement multi-threaded HTML generation | In Progress | High |
| 8-016 | Validate poem representation in pagination | Open | Medium |
| 8-020 | Hybrid pagination strategy (45GB constraint) | In Progress | High |
| 8-031 | Convert between similarity matrix formats | Open | Low |
| 8-034 | Refactor to triangular individual files only | Open | Low |
| 8-058 | Eliminate main thread / worker code duplication | Open | Medium |
Completed Issues
| Issue | Description | Status | Completed |
|---|---|---|---|
| 8-059 | Unify temp/ and tmp/ into a single tmpfs-symlinked tmp/ | Completed | 2026-06-15 |
| 8-057 | Boost visual formatting | Completed | 2026-01-29 |
| 8-045 | Timeline-based progress bar calculation | Completed | 2026-01-30 |
| 8-040 | Add images to similar/different pages | Completed | 2026-01-28 |
| 8-011 | Scrape fediverse boost content | Completed (279/458 scraped) | 2026-01-28 |
| 8-012 | Implement paginated similarity chapters | Completed | 2026-01-21 |
| 8-038 | Center poem containers on page | Completed | 2026-01-21 |
| 8-048 | Flatten media directory for deployment | Completed | 2026-01-23 |
| 8-051 | Order poem index categories by ascending count | Completed | 2026-01-26 |
| 8-052 | Normalize vertical bar characters in HTML output | Completed | 2026-01-26 |
| 8-053 | Add image title attribute and fix alt-text fallback | Completed | 2026-01-26 |
| 8-049 | Implement audio and video playback | Completed | 2026-01-28 |
| 8-054 | Extract image attachments from Matrix media messages | Completed | 2026-01-28 |
| 8-041 | Escape HTML characters in poem content | Completed | 2026-01-28 |
| 8-050 | Enhance word-cloud semantic similarity pages | Completed | 2026-01-28 |
| 8-050e | Centroid-based chronological link for word pages | Completed | 2026-01-28 |
| 8-050c | Apply word color to word-page header | Completed | 2026-01-28 |
| 8-050b | Color-contextualized similarity ranking | Completed | 2026-01-28 |
| 8-050d | Configurable poems-per-word-page via run.sh | Completed | 2026-01-28 |
| 8-050a | Compute semantic color for word-cloud words | Completed | 2026-01-28 |
| 8-055 | Fix golden poem formatting on similar/different pages | Completed | 2026-01-28 |
| 8-056 | Preserve whitespace in poem rendering (shared formatter) | Completed | 2026-01-28 |
| 8-029 | Consolidate similarity matrix functions | Completed | 2026-01-04 |
| 8-028 | Clean output directory of test/demo files | Completed | 2026-01-04 |
| 8-027 | Implement extendable diversity cache | Completed | 2026-01-04 |
| 8-026 | Add diversity computation progress display | Completed | 2026-01-04 |
| 8-025 | Fix diversity cache validation order bug | Completed | 2026-01-04 |
| 8-024 | Improve similarity matrix progress display | Completed | 2026-01-04 |
| 8-023 | Fix similarity matrix function call in run.sh | Completed | 2026-01-04 |
| 8-022 | Add pagination CLI flags to HTML generation | Completed | 2026-01-04 |
| 8-021 | Fix embedding progress counter overcounting | Completed | 2026-01-04 |
| 8-019 | Implement unique poem_index system | Completed | 2025-12-25 |
| 8-018 | Fix embedding directory case inconsistency | Completed | 2025-12-25 |
| 8-030 | Add chronological anchor links | Completed | 2026-01-09 |
| 8-003 | Remove remaining CSS from HTML generation | Completed | 2025-12-23 (reopened, re-completed) |
| 8-004 | Implement embedding validation and empty poem handling | Completed | 2025-12-14 |
| 8-006 | Fix golden poem box-drawing format | Completed | 2025-12-15 |
| 8-007 | Add box-drawing borders around navigation links | Completed | 2025-12-15 |
| 8-008 | Implement configurable centroid embedding system | Completed | 2025-12-23 |
| 8-009 | Project cleanup and organization | Completed | 2025-12-17 |
| 8-010 | Fix note filenames in generated HTML | Completed | 2025-12-23 |
| 8-013 | Implement TXT export functionality | Completed | 2025-12-23 |
| 8-015 | Implement ZIP extraction freshness check | Completed | 2025-12-23 |
| 8-005 | Integrate images into HTML output | Completed | 2025-12-23 |
| 8-035 | Colorize nav boxes according to progress bar | Completed | 2026-01-21 |
| 8-036 | Add poem identification to ranking headers | Completed | 2026-01-21 |
| 8-039 | Move chronological files to subdirectory | Completed | 2026-01-21 |
| 8-042 | Sync images from configurable directories | Completed | 2026-01-21 |
| 8-043 | Generate semantic word cloud page (full) | Completed | 2026-01-21 |
| 8-046 | Create menu navigation page | Completed | 2026-01-21 |
| 8-047 | Implement dark mode (always on) | Completed | 2026-01-21 |
| 8-005 | Integrate images into HTML output (re-completed) | Completed | 2026-01-21 |
| 8-030 | Add chronological anchor links (re-completed) | Completed | 2026-01-21 |
| 8-037 | Fix similar/different box alignment (re-completed) | Completed | 2026-01-21 |
8-047: Implement Dark Mode (Always On) - COMPLETED (2026-01-21)
- ✅ Updated 17
<body>tags across 6 generator files - ✅ True black background (
#000000), white text (#FFFFFF) - ✅ Accessible link colors (
#6699FF,#9966FF) - ✅ CSS-free using HTML bgcolor/text/link/vlink attributes
8-046: Create Menu Navigation Page - COMPLETED (2026-01-21)
- ✅ Changed header "How to explore" link to "Menu" → wordcloud.html
- ✅ Added poem index section to wordcloud.html
- ✅ Poems grouped by category (fediverse, notes, messages, bluesky)
- ✅ Each poem links to its chronological position
8-043: Generate Semantic Word Cloud Page - RE-COMPLETED (2026-01-21)
- ✅ Word cloud words now link to
wordcloud/{word}.htmlsimilarity pages - ✅ Created
src/generate-word-pages.luafor generating word similarity pages - ✅ Word embeddings cached to
word_embeddings.json - ✅ Pages show top 50 most similar poems per word
8-005: Integrate Images into HTML Output - RE-COMPLETED (2026-01-21)
- ✅ Fixed viewport overflow with
style="max-width:100%; height:auto" - ✅ Pragmatic CSS exception approved for responsive images
- ✅ Maintains aspect ratio, prevents horizontal scroll
8-030: Add Chronological Anchor Links - RE-COMPLETED (2026-01-21)
- ✅ Fixed anchor ID format mismatch in effil worker
- ✅ Chronological pages:
id="poem-fediverse-4210"(full category) - ✅ Similar/different links:
href="...#poem-fediverse-5000"(now matches) - ✅ Pagination-aware links working:
chronological/05.html#poem-fediverse-5000
8-037: Fix Similar/Different Box Alignment - RE-COMPLETED (2026-01-21)
- ✅ Fixed bottom progress bar off-by-one in effil worker
- ✅ Corrected
build_segment()call positions (1 to LEFT_JUNCTION, not 0) - ✅ Junction characters (
╧,┴) now align with box corners - ✅ Similar/different pages match chronological page formatting
8-039: Move Chronological Files to Subdirectory - RE-COMPLETED (2026-01-21)
- ✅ Changed output from
chronological-XX.htmltochronological/XX.html - ✅ Updated
generate_chronological_page_navigation()for relative paths - ✅ Updated redirect/index file to point to
chronological/index.html - ✅ Updated similar/different page links to
chronological/subdirectory - ✅ Manually deleted 79 old
chronological-*.htmlfiles from output root - ✅ Fixed internal pagination links (
%s.htmlinstead ofchronological-%s.html)
8-042: Sync Images From Configurable Directories - COMPLETED (2026-01-21)
- ✅ Added
image_syncsection toconfig/input-sources.json - ✅ Added
sync_images_from_config()function toscripts/update-words - ✅ Uses
jqfor JSON parsing,rsyncfor file syncing - ✅ Supports multiple source directories with preserve_structure option
- ✅ Reports per-source and total sync statistics
8-043: Generate Semantic Word Cloud Page (MVP) - COMPLETED (2026-01-21)
- ✅ Created
config/stop-words.txtwith 271 categorized stop words - ✅ Created
src/wordcloud-generator.luawith frequency-based sizing - ✅ Added
word_cloudsection toconfig/input-sources.json - ✅ Updated Issue 10-003 with vimfolded config sections
- ✅ 200 words displayed from 23,455 unique (after filtering)
- ⏳ Future: embedding-based semantic weighting, word similarity pages
8-035: Colorize Nav Boxes According to Progress Bar - COMPLETED (2026-01-21)
- ✅ Added
colorize_char()helper function for color wrapping - ✅ Modified
generate_regular_corner_box_top()with progressive colorization - ✅ Modified
generate_regular_corner_box_nav_line()with wall colorization - ✅ Updated effil worker thread with equivalent inline functions
- ✅ Left box (positions 0-10) colorizes as progress reaches them
- ✅ Right box (positions 70-82) colorizes when progress passes 70
- ✅ Uses poem's semantic color (red, blue, green, purple, orange, yellow, gray)
8-036: Add Poem Identification to Ranking Headers - COMPLETED (2026-01-21)
- ✅ Added
get_source_path()helper to effil worker thread - ✅ Ranking headers now show:
--- #N category/identifier --- - ✅ Notes:
notes/source_file(e.g., "notes/what-a-lame-movie") - ✅ Bluesky:
bluesky#N(e.g., "bluesky#42") - ✅ Fediverse:
fediverse/N(e.g., "fediverse/1234") - ✅ Messages:
messages/N(e.g., "messages/567")
8-030: Add Chronological Anchor Links - COMPLETED (2026-01-09)
- ✅ Created
get_poem_anchor_id()helper function - ✅ Updated navigation functions for three-part layout
- ✅ Added chronological links centered between similar/different
- ✅ Added HTML anchor IDs to chronological.html
- ✅ Layout:
│ similar │ chronological │ different │ - ✅ Test: 100 chronological links found in generated pages
- ✅ Format:
chronological.html#poem-{category}-{id}
Issue Details
8-001: Unified Website Generation Pipeline - IN PROGRESS (Phases 1-6 complete)
- ✅ Phase 1: HTML Integration - complete
- ✅ Phase 2: Parallel HTML Generation - complete
- ✅ Phase 3: Embedding Integration - complete (2025-12-25)
--generate-embeddingsflag with freshness checks--modeloption for model selection- All 7,844 poems have embeddings (100% complete as of 2026-01-04)
- ✅ Phase 4: Similarity Matrix Integration - complete (2025-12-25)
--generate-similarityflag with dependency validation- Run
./scripts/validate-pipeline-data --quickto check current completion - ✅ Phase 5: Diversity Cache Integration - complete (2025-12-25)
--generate-diversityflag with freshness checks- Extendable cache with incremental saves implemented (8-027)
- ✅ Phase 6: Pipeline Orchestration - complete (2025-12-25)
--fullruns all 10 stages,--allruns fast stages (1-5, 9-10)- TUI updated with new stages (marked ⚠️ for expensive ones)
- Pending: Full HTML regeneration with boost formatting (8-057)
- Run
./run.sh --generate-htmlafter merging boost formatting changes
8-002: Implement Multi-threaded HTML Generation - IN PROGRESS
- ✅ Created
scripts/generate-html-parallelusing effil library - ✅ Similarity page generation working (10 pages/sec with 4 threads)
- ✅ Batch-based thread pool with progress reporting
- ✅ Difference page generation working (centroid-based diversity algorithm)
- ✅ 62MB embeddings loaded and shared via effil.table
- ✅ Option C optimization:
scripts/precompute-diversity-sequencescreated - ✅ Thermal management with configurable sleep between batches
- ✅ Cache-based fast path in generate-html-parallel
- Pending: Run pre-computation (~42 hours), pipeline integration
8-003: Remove Remaining CSS from HTML Generation - COMPLETED (2025-12-23 re-completed)
- ✅ Removed 3
<style>blocks from templates (Phase 1) - ✅ Replaced inline
style=with<font color=""><b>tags (Phase 1) - ✅ Removed container div inline styles (Phase 1)
- ✅ Verified: 0 style attributes, 15,576 font color tags in test output (Phase 1)
- ✅ [PHASE 2] Removed remaining 4
style=attributes missed in Phase 1: - Image tags:
style="max-width:100%%; height:auto;"(2 occurrences) - Pre tags:
style="text-align: left; max-width: 90ch; margin: 0 auto;"(2 occurrences) - ✅ [PHASE 2] Templates now use plain
<pre>tag without CSS - ✅ [PHASE 2] Verified: 0 style attributes in generated HTML
8-004: Implement Embedding Validation and Empty Poem Handling - COMPLETED
- ✅ Empty poems now get random embeddings (seeded by poem ID for reproducibility)
- ✅ Random embeddings normalized to unit vectors
- ✅ Added
is_random = trueflag to identify synthetic embeddings - ✅ Pre-flight validation in
scripts/generate-html-parallel - ✅ Pre-flight validation in
scripts/precompute-diversity-sequences - ✅ Scripts exit with helpful error if poems with content lack embeddings
8-005: Integrate Images into HTML Output - OPEN
- Image catalog exists (539 images in
assets/image-catalog.json) flat-html-generator.luadoes not consume image catalog- Need to associate images with poems via source metadata
- Render
<img>tags in poem HTML output
8-006: Fix Golden Poem Box-Drawing Format - COMPLETED
- ✅ Rewrote
apply_golden_poem_formatting()with proper 84-char box - ✅ Integrated progress bar colors into golden corners (╔═─┐ / ╚═─┘)
- ✅ Added side borders with padding (║ content │) to each content line
- ✅ Updated
generate_progress_dashes()to 82-char border width - ✅ Fixed line splitting and text wrapping to preserve paragraph breaks
- ✅ 244 golden poems now render correctly with 80-char content area
8-007: Add Box-Drawing Borders Around Navigation Links - COMPLETED
- ✅ Added corner box separator line:
╟─────────┐+ gap +┌───────────┤ - ✅ Added corner box navigation line with vertical walls
- ✅ Bottom border junctions adapt to progress:
╧for ═ section,┴for ─ section - ✅ Regular (non-golden) poems now have corner boxes connecting to progress bar
- ✅ Corner characters:
╘(left) and┘(right) close regular poem corner boxes
8-013: Implement TXT Export Functionality - COMPLETED (2025-12-23)
- ✅
render_attachment_images_txt()for[Image: alt-text]format - ✅
strip_html_tags()for removing HTML and decoding entities - ✅
generate_txt_file_header()for consistent file headers - ✅
generate_similarity_txt_file()with headers - ✅
generate_diversity_txt_file()with headers - ✅
M.generate_chronological_txt_file()created and integrated - ✅ Pipeline integration (regenerate-clean-site.lua, main.lua)
- Note: Download links in HTML pages moved to 8-012 scope
8-012: Implement Paginated Similarity Chapters - IN PROGRESS (Phases A+B+C+D complete)
- ✅ Circular dependency with 8-013 resolved
- ✅ Added pagination config to
config/input-sources.json - ✅ Documented
minimum_pagessetting requirement - ✅ Phase A: Core pagination logic implemented (10 new functions)
- ✅ Phase B: Prev/next navigation implemented
- ✅ Test: 134KB page with 100 poems, proper navigation
- ✅ Phase C: Export Format Integration (2026-01-09)
- Download links (.txt and .html archive) in every paginated page
- Created
generate_similarity_html_archive()andgenerate_diversity_html_archive() - Integrated into
M.generate_flat_html_with_similarity_and_diversity() - Test: All links present and correctly formatted
- ✅ Phase D: Generation Strategy - COMPLETED (2026-01-11)
- Created
parse_pages_specification()for --pages flag parsing ("1", "all", "1-10") - Integrated pagination into
M.generate_complete_flat_html_collection() - Updated CLI argument parsing (utils.parse_cli_args)
- Modified main.lua to pass pages parameter through pipeline
- Test: Pagination generates correct filenames (similar/0001-01.html format)
- Files use poem_index (numeric) for consistent naming
- Default: generates page 1 only (minimum_pages=1)
- Pending: Phase E - Integration (entry points, testing, max_pages enforcement)
- ✅ Phase E Step 16 - Chronological stays as single file (per 8-020)
- Related: 8-016 (validator) depends on this issue
- Modified by 8-020: Hybrid pagination strategy
8-020: Hybrid Pagination Strategy - OPEN
- Storage constraint: 45 GB Neocities limit
- Full chronological.html: All 7,793 poems (~12 MB) - NOT paginated
- Paginated similar/different: Max 15 pages per poem = 1,500 poems per direction
- Storage budget: ~38 GB used of 45 GB available
- Reserved: ~31 MB for Phase 11 maze pages
- Modifies 8-012, 8-016 validation scope
8-016: Validate Poem Representation in Pagination - OPEN
- Depends on 8-012 completion
- Post-generation validator to ensure all poems appear in output
- Optional
--fixflag to regenerate missing pages - Pipeline integration for deployment confidence
8-008: Implement Configurable Centroid Embedding System - COMPLETED
- ✅ Created
assets/centroids.jsonconfig with 5 example moods (melancholy, wonder, rage, tenderness, absurdity) - ✅ Implemented
src/centroid-generator.luafor embedding generation via Ollama - ✅ Implemented recursive chunking algorithm for long content (not triggered for keyword-only centroids)
- ✅ Created
src/centroid-html-generator.luafor HTML page generation - ✅ Generated 11 files: index.html + 5 similar + 5 different pages in
output/centroid/ - ✅ Similarity scores verified working (0.72-0.78 range for top matches)
- Use cases: themed entry points, mood-based exploration, curated collections
8-010: Fix Note Filenames in Generated HTML - COMPLETED
- ✅ Created
get_poem_display_filename()helper function - ✅ Notes now display original descriptive filenames (e.g.,
notes/what-a-lame-movie) - ✅ Fediverse/messages display numeric ID without
.txtextension - ✅ Updated all 4 file header generation locations
- ✅ Verified: notes show
source_file, no.txtextensions anywhere
8-056: Preserve Whitespace in Poem Rendering - COMPLETED (2026-01-28)
- ✅ Created
libs/text-formatter.luashared module for whitespace preservation - ✅ Updated main thread to preserve whitespace for ALL categories (not just notes)
- ✅ Updated worker thread (effil) to use shared module instead of
%S+word-splitting - ✅ Both code paths now use identical formatting logic via the shared module
- ✅ Old
wrap_text_80_chars()retained for UI text (CW boxes, help pages, TXT export) - Architectural fix: eliminates divergent code paths that caused the bug
8-055: Fix Golden Poem Formatting - COMPLETED (2026-01-28)
- ✅ Bug 1: HTML entity padding - now uses
text_formatter.calculate_visible_width() - ✅ Bug 2: Junction positions - changed
GOLDEN_LEFT_JUNCTION_POSfrom 9→10,GOLDEN_RIGHT_JUNCTION_POSfrom 70→71 - ✅ Bug 3: Worker thread now receives layout config via
thread_config.layout - Golden poems now render identically on chronological and similar/different pages
8-050d: Configurable Poems-per-Word-Page - COMPLETED (2026-01-28)
- ✅ Added
--wordcloud-poems NCLI flag to run.sh - ✅ Added TUI menu item in "Word Cloud Options" section (hotkey 'p')
- ✅ Updated
src/generate-word-pages.luato parse--poems-per-pageargument - ✅ Configuration precedence: CLI > config.lua > default (50)
8-050b: Color-Contextualized Similarity Ranking - COMPLETED (2026-01-28)
- ✅ Added
balanced_color_select()function using cumulative-similarity round-robin - ✅ Pre-filters to top K candidates (K = N×7) for semantic relevance
- ✅ Groups candidates into 7 color buckets, picks from lowest cumulative total
- ✅ Falls back to pure similarity ranking if color embeddings unavailable
- Result: Each word page now shows poems from all 7 semantic colors, not just the dominant one
8-050c: Apply Word Color to Word-Page Header - COMPLETED (2026-01-28)
- ✅ Implemented Option 3 (Hybrid): word color for header, per-poem colors for bars
- ✅ Added
word_hex_colorparameter togenerate_word_page() - ✅ Header word now rendered with semantic color:
<font color="...">word</font> - Result: "silence" shows in blue, "fire" in red, while bars show color diversity from 8-050b
8-050e: Centroid-Based Chronological Link - COMPLETED (2026-01-28)
- ✅ Added
compute_centroid()andfind_closest_poem_to_centroid()helper functions - ✅ Built
chrono_page_mapto map poem_index → chronological page number - ✅ Header now includes Main, Word Cloud, and centroid-targeted Chronological links
- ✅ Chronological link points to the poem closest to the word's semantic center
8-050: Enhance Word-Cloud Semantic Similarity Pages - COMPLETED (2026-01-28)
All 5 sub-issues completed:
- 8-050a: Word color computation ✓
- 8-050b: Balanced color selection algorithm ✓
- 8-050c: Word color in header ✓
- 8-050d: Configurable poems-per-page ✓
- 8-050e: Centroid-based chronological navigation ✓
Completion Criteria
- [ ]
run.shgenerates complete website without manual intervention - [ ] All poem IDs have corresponding similar/XXX.html files
- [ ] All poem IDs have corresponding different/XXX.html files
- [ ] Navigation links use "different" (not "unique")
- [ ] chronological.html links point to correct files
- [ ] Generation completes in reasonable time
Phase Status: OPEN
Started: 2025-12-14