issues/completed/10-003b-external-files-syncing-centralization.md
Issue 10-003b: External Files Syncing Centralization
Priority: Medium
Phase: 10 (Developer Experience & Tooling)
Status: Open
Created: 2026-01-30
Parent Issue: 10-003
Summary
Centralize all external file syncing operations into a single external_files config
section. No external files should be pulled unless they're specified in this section.
Current Behavior
External file syncing operations are hardcoded in scripts:
| Script | Line | Source | Destination |
|---|---|---|---|
scripts/update | 117-129 | /home/ritz/backups/bluesky/input | input/bluesky/ |
scripts/update-words | 170-284 | Config-driven (image_sync.sources) | input/media_attachments/ |
Additionally, scripts depend on specific filenames:
scripts/updatelines 199-206: Checks forrepo.carorrepo-2.carspecificallyscripts/extract-bluesky-dataline 20: Defaults torepo-2.car
Issues with current approach:
- External paths are scattered across multiple scripts
- Scripts depend on specific filenames (breaks if filename changes)
- No single view of what external data the pipeline requires
- Cannot easily disable or redirect a source
- Hard to replicate setup on another machine
Intended Behavior
All external file syncing declared in a single external_files config section:
-- {{{ external_files
-- Defines all external files/directories the pipeline pulls from.
-- NO external file operations should occur unless configured here.
-- All destinations are relative to input/.
-- Archive selection (newest by mtime) is handled by downstream scripts.
external_files = {
{
name = "bluesky-car",
source = "/home/ritz/backups/bluesky/input",
destination = "bluesky",
},
{
name = "my-art",
source = "/home/ritz/pictures/my-art",
destination = "media_attachments/my-art",
},
{
name = "things-I-almost-posted",
source = "/home/ritz/pictures/things-i-almost-posted",
destination = "media_attachments/things-i-almost-posted",
},
{
name = "poem-pictures",
source = "/home/ritz/pictures/poem-pictures",
destination = "media_attachments/poem-pictures",
},
},
-- }}}
Config Field Definitions
| Field | Required | Description |
|---|---|---|
name | Yes | Unique identifier for logging and CLI targeting |
source | Yes | Absolute path to external directory |
destination | Yes | Relative path under input/ |
optional | No | Default false. If true, missing source shows warning instead of error |
Note: No pattern or select fields needed - archive selection logic (newest by mtime)
is already implemented in downstream scripts (see issue 7-005).
Hardcoded References to Fix
Scripts must accept generic arguments and not depend on specific filenames.
scripts/update (lines 117-129, 199-206)
Current (hardcoded):
BLUESKY_BACKUP_DIR="/home/ritz/backups/bluesky/input"
...
if [ -f "${DIR}/input/bluesky/repo.car" ] || [ -f "${DIR}/input/bluesky/repo-2.car" ]; then
if [ -f "${DIR}/input/bluesky/repo-2.car" ]; then
CAR_FILE="${DIR}/input/bluesky/repo-2.car"
elif [ -f "${DIR}/input/bluesky/repo.car" ]; then
CAR_FILE="${DIR}/input/bluesky/repo.car"
Fix: Find ANY .car file in input/bluesky/, prefer newest by mtime:
CAR_FILE=$(find "${DIR}/input/bluesky" -name "*.car" -type f -printf '%T@ %p\n' 2>/dev/null | sort -rn | head -1 | cut -d' ' -f2-)
if [ -n "$CAR_FILE" ]; then
"${DIR}/scripts/extract-bluesky-data" "$CAR_FILE" ...
fi
scripts/extract-bluesky-data (line 20)
Current (hardcoded default):
local INPUT_CAR = arg[1] or DIR .. "/input/bluesky/repo-2.car"
Fix: Find any .car file if not provided:
local function find_car_file()
local handle = io.popen("find '" .. DIR .. "/input/bluesky' -name '*.car' -type f 2>/dev/null | head -1")
local result = handle:read("*l")
handle:close()
return result
end
local INPUT_CAR = arg[1] or find_car_file() or error("No CAR file found in input/bluesky/")
config.lua (lines 190, 196, 202)
Current: Already config-driven via image_sync.sources - these will be migrated to external_files
Sections to Deprecate
After implementing external_files, these sections become redundant:
| Section | Replacement |
|---|---|
image_sync.sources | Merged into external_files with destination = "media_attachments/..." |
Note: The image_sync section has additional options (preserve_structure, overwrite_existing,
supported_formats) that should be moved to either global sync settings or per-entry options
in external_files.
Implementation Steps
Phase 1: Fix Hardcoded Filenames
- [x] Update
scripts/updateto find any.carfile by pattern (not specific names) - [x] Update
scripts/extract-bluesky-datato find CAR file dynamically if not provided
Phase 2: Centralize External Sources
- [x] Add
external_filessection toconfig.lua - [x] Create
libs/external-sync.luamodule to process external_files config - [x] Update
scripts/updateto use external-sync module (replace hardcoded Bluesky sync) - [x] Merge
image_sync.sourcesentries intoexternal_files(already present in external_files) - [x] Update
scripts/update-wordsto iterate external_files instead of custom logic
Phase 3: CLI Integration
- [x] Add
--list-externalCLI flag to show configured external sources - [x] Add
--sync-only NAMECLI flag to sync a single source
Phase 4: Cleanup
- [x] Remove deprecated
image_sync.sourcessection - [x] Verify no hardcoded external paths remain (note:
/home/ritz/backups/words/sync-to-projectsin update-words is out of scope - it's a separate words/notes sync system)
Files to Update
| File | Changes |
|---|---|
config.lua | Add external_files section |
libs/external-sync.lua | New module for syncing logic |
scripts/update-words | Replace hardcoded sync-to-projects call |
scripts/update | Replace hardcoded Bluesky CAR sync |
run.sh | Add CLI flags for external source management |
Success Criteria
- [x] All external file syncing declared in
external_filesconfig - [x] No hardcoded external paths in scripts (Bluesky, images)
- [x] Scripts accept generic arguments (not specific filenames like
repo-2.car) - [x] Scripts discover files by pattern/extension, not hardcoded names
- [x] CLI flags for listing and selectively syncing sources
- [x] Missing required sources produce fatal errors
- [x] Missing optional sources produce attention messages
- [x] Easy to add/remove/disable external sources via config only
Related Sub-Issues
- 10-003a: Initial config file consolidation (COMPLETED)
- 10-003c: Unified input sources structure
ISSUE STATUS: COMPLETED
Implementation Notes (2026-01-30)
Files Created
libs/external-sync.lua- Module for reading external_files config and syncing via rsyncscripts/sync-external-files- CLI wrapper for the external-sync module
Files Modified
config.lua- Addedexternal_filessection, removedimage_syncsectionscripts/update- Replaced hardcoded Bluesky sync with call to sync-external-filesscripts/update-words- Replaced 100+ line sync_images_from_config() with call to sync-external-filesrun.sh- Added--list-externaland--sync-only NAMECLI flags
Key Design Decisions
- rsync-based: Uses rsync with
-a --ignore-existingfor efficient incremental syncing - Config-driven: All sources defined in
external_filesarray in config.lua - Optional sources: Sources with
optional = trueskip with warning instead of error - CLI integration: Both
run.shand standalonescripts/sync-external-filesavailable
Out of Scope
- The
/home/ritz/backups/words/sync-to-projectsscript in update-words is a separate words/notes sync system, not part of this centralization effort