docs/project-overview.md
Neocities Poetry Modernization Project
Project Summary
This project modernizes the ritzmenardi.com website by creating an intelligent poetry recommendation system. The system processes 7,797 poems from multiple sources (words.pdf, fediverse posts, messages, notes) and generates similarity-based recommendations using local LLM embeddings through a comprehensive Lua-based similarity engine.
Current Status: Phase 8 Website Completion 🔄
Completed Systems:
- ✅ Complete poem extraction and validation (7,797 poems)
- ✅ Advanced similarity engine with incremental caching
- ✅ Per-model embedding storage system supporting multiple models
- ✅ Robust network error handling and retry mechanisms
- ✅ Interactive bash CLI with real-time progress monitoring
- ✅ Comprehensive cache management and flush operations
- ✅ Complete HTML generation system with similarity navigation
- ✅ Golden poem identification and collection features
- ✅ JavaScript-free responsive design for mobile/desktop
- ✅ Data integrity improvements and infrastructure optimization
Current Phase: Dual system implementation (similarity + diversity) and similarity algorithm research
Next Phase: Visual content integration and user experience enhancements
Key Features
- Poem Extraction ✅: Successfully extracted 7,797 individual poems from multiple sources
- Embedding Generation ✅: Multi-model support (EmbeddingGemma:latest, text-embedding-ada-002, all-MiniLM-L6-v2)
- Similarity Engine ✅: Cosine similarity calculation with intelligent caching
- Incremental Processing ✅: Smart detection of existing embeddings for efficient updates
- HTML Generation ✅: Complete static HTML system with ~6400 related/different pages
- Golden Poem Features ✅: Fediverse-prioritized golden poem identification and collection
- Responsive Design ✅: Mobile-optimized interface without JavaScript dependencies
- Advanced Discovery 🔄: Dual exploration system (simple similarity + progressive centroid diversity) and algorithm research (Phase 5)
- Visual Content Integration 📋: Image placement and content warning systems (Phase 6)
- Export Systems 📋: PDF generation with words-pdf styling (Phase 6)
Technical Architecture
- Processing Backend: Lua-based similarity engine with comprehensive CLI tools
- Embedding Models: Multiple model support via Ollama (EmbeddingGemma:latest, etc.)
- Storage System: Per-model JSON caching with automatic migration
- Network Resilience: Exponential backoff retry with configurable error thresholds
- Data Flow: words.pdf → extracted poems → per-model embeddings → similarity matrix → HTML pages
System Capabilities
- Incremental Processing: Only processes new/changed poems for efficiency
- Model Isolation: Separate storage for different embedding models
- Error Recovery: Robust handling of network issues and service interruptions
- Cache Management: Flush operations with backup and selective cleaning
- Real-time Monitoring: Live progress bars with accurate time estimates
- Dual Exploration System: Simple similarity ranking for focused discovery + progressive centroid diversity for expansive exploration
- Cross-Navigation: Seamless switching between similarity and diversity exploration modes
Source Materials
- Poetry source: /home/ritz/programming/ai-stuff/words-pdf
- Website backup: /home/ritz/neocities
- Ollama installation: /home/ritz/programs/ollama