issues/completed/10-022-fix-empty-embeddings-validation.md
Issue 10-022: Fix Empty Embeddings Validation
Status: COMPLETED
Current Behavior
When the embeddings.json file exists but contains an empty embeddings array (e.g., due to failed embedding generation from network errors), the GPU similarity module crashes with:
luajit: libs/vulkan-compute/lua/vk_similarity.lua:273: attempt to index a nil value
And the generate-embeddings.sh script crashes with:
lua: (command line):42: bad argument #6 to 'format' (not a number in proper range)
The root cause is that embedding generation failed (due to Ollama network errors), leaving an empty embeddings array:
{
"embeddings":[],
"metadata":{
"processing_mode":"terminated_network_error",
"completed_embeddings":0,
"termination_reason":"consecutive_network_errors"
}
}
Intended Behavior
- GPU similarity module should detect empty embeddings and provide clear error message with remediation steps
- generate-embeddings.sh should handle zero embeddings without division by zero errors
- Error messages should explain the root cause (network failure) and how to fix it
Implementation
vk_similarity.lua (lines 266-287)
Added validation after loading embeddings:
local num_poems = #embeddings_data.embeddings
-- Validate embeddings array is non-empty before accessing first element
-- Empty arrays occur when embedding generation failed (network errors, etc.)
if num_poems == 0 then
local reason = embeddings_data.metadata and embeddings_data.metadata.termination_reason or "unknown"
local mode = embeddings_data.metadata and embeddings_data.metadata.processing_mode or "unknown"
error(string.format(
"[GPU SIMILARITY ERROR] Embeddings array is empty (0 poems).\n" ..
" Processing mode: %s\n" ..
" Termination reason: %s\n" ..
" Remedy: Regenerate embeddings with: ./run.sh --generate-embeddings --force\n" ..
" Ensure Ollama is running: ollama serve",
mode, reason
))
end
Same fix applied to both:
generate_similarity_matrix_gpu_parallel()(line 260)generate_similarity_matrix_gpu()(line 539, deprecated sequential version)
generate-embeddings.sh (lines 811-819)
Added guards against division by zero in statistics calculation:
-- Guard against division by zero when embeddings array is empty
local success_rate = 0
if total > 0 then
success_rate = math.floor((successful / total) * 100)
end
local processing_rate = 0
if $TOTAL_TIME > 0 then
processing_rate = math.floor(successful * 3600 / $TOTAL_TIME)
end
Files Modified
libs/vulkan-compute/lua/vk_similarity.lua- Added empty array validation with descriptive errorgenerate-embeddings.sh:- Added division by zero guards in statistics calculation
- Changed from piped stdin (
echo | lua -I) to direct function call vialuajit -e - The piped stdin caused curl exit code 7 (connection refused) due to file descriptor inheritance issues
- Added explicit failure detection when
SUCCESSFUL=0orprocessing_mode=terminated_network_error - Previously would show "GENERATION SUCCESSFUL" even when 0 embeddings were created
Lessons Learned
- Always validate array bounds before accessing elements, especially when loading external data
- Division operations need guards when denominators can be zero
- Error messages should include context from metadata (like
termination_reason) to help diagnosis - The "attempt to index a nil value" error in Lua often indicates array bounds issues
- Piped stdin can cause subprocess failures: When running
echo "input" | lua script.lua, the script's child processes (likecurlviaio.popen) can fail with exit code 7 (connection refused) due to inherited file descriptors - Use direct function calls (
luajit -e "require('module').function()") instead of piped interactive modes when calling from scripts - Success messages should verify actual results, not just exit codes - a script can exit 0 but produce 0 useful outputs
Related Issues
- 8-004: Implement embedding validation and empty poem handling
- 10-017: Multi-Ollama server configuration (network connectivity)
Completed
2026-02-10