issues/completed/5-010b-implement-validation-framework.md
Issue 010b: Implement Validation Framework
Current Behavior
- No systematic validation of similarity scores in stored JSON files
- No verification that cached similarity data matches actual calculations
- Missing framework for testing similarity algorithm accuracy
- No automated validation reports or error detection
Intended Behavior
- Iterative validation system that recalculates and verifies stored similarity scores
- Comprehensive validation reports with accuracy metrics and discrepancies
- Support for validating full datasets or statistical samples
- Automated detection of data corruption or calculation errors
Suggested Implementation Steps
- Validation Engine: Core system for iterating through similarity data and validating scores
- Data Loading: Efficient loading and processing of similarity matrices and embeddings
- Comparison Logic: Compare stored vs calculated scores with configurable tolerance
- Report Generation: Detailed validation reports with statistics and error analysis
- Sampling System: Support for validating representative samples of large datasets
Technical Requirements
Validation Engine Architecture
-- {{{ ValidationEngine class
local ValidationEngine = {}
ValidationEngine.__index = ValidationEngine
function ValidationEngine:new(config)
local obj = {
config = config or {},
calculator = nil, -- Will be set from modular calculator
tolerance = config.tolerance or 0.001,
sample_size = config.sample_size or nil, -- nil = validate all
progress_callback = config.progress_callback or nil,
validation_results = {
total_comparisons = 0,
accurate_scores = 0,
inaccurate_scores = 0,
missing_embeddings = 0,
errors = {},
discrepancies = {},
start_time = nil,
end_time = nil
}
}
setmetatable(obj, ValidationEngine)
return obj
end
function ValidationEngine:set_calculator(calculator)
self.calculator = calculator
end
-- {{{ function ValidationEngine:validate_similarity_matrix
function ValidationEngine:validate_similarity_matrix(similarity_file, embeddings_file)
if not self.calculator then
error("Similarity calculator must be set before validation")
end
self.validation_results.start_time = os.time()
utils.log_info(string.format("Starting validation: %s vs %s", similarity_file, embeddings_file))
-- Load data
local similarity_data = utils.load_json(similarity_file)
local embeddings_data = utils.load_json(embeddings_file)
if not similarity_data or not embeddings_data then
error("Failed to load validation data files")
end
-- Create validation sample
local validation_pairs = self:create_validation_sample(similarity_data, embeddings_data)
utils.log_info(string.format("Validating %d similarity pairs...", #validation_pairs))
-- Validate each pair
local progress_interval = math.max(1, math.floor(#validation_pairs / 20))
for i, pair in ipairs(validation_pairs) do
if i % progress_interval == 0 and self.progress_callback then
self.progress_callback(i, #validation_pairs)
end
self:validate_similarity_pair(pair, embeddings_data)
end
self.validation_results.end_time = os.time()
return self:generate_validation_report()
end
-- }}}
Similarity Pair Validation
-- {{{ function ValidationEngine:validate_similarity_pair
function ValidationEngine:validate_similarity_pair(pair, embeddings_data)
local poem_a_id, poem_b_id, stored_score = pair.poem_a, pair.poem_b, pair.stored_score
self.validation_results.total_comparisons = self.validation_results.total_comparisons + 1
-- Get embeddings
local embedding_a = embeddings_data[tostring(poem_a_id)]
local embedding_b = embeddings_data[tostring(poem_b_id)]
if not embedding_a or not embedding_b then
self.validation_results.missing_embeddings = self.validation_results.missing_embeddings + 1
table.insert(self.validation_results.errors, {
type = "missing_embedding",
poem_a = poem_a_id,
poem_b = poem_b_id,
missing = not embedding_a and "poem_a" or "poem_b"
})
return
end
-- Calculate actual similarity
local success, calculated_score = pcall(function()
return self.calculator:calculate(embedding_a, embedding_b)
end)
if not success then
table.insert(self.validation_results.errors, {
type = "calculation_error",
poem_a = poem_a_id,
poem_b = poem_b_id,
error = calculated_score -- This will contain the error message
})
return
end
-- Compare scores
local difference = math.abs(calculated_score - stored_score)
if difference <= self.tolerance then
self.validation_results.accurate_scores = self.validation_results.accurate_scores + 1
else
self.validation_results.inaccurate_scores = self.validation_results.inaccurate_scores + 1
table.insert(self.validation_results.discrepancies, {
poem_a = poem_a_id,
poem_b = poem_b_id,
stored_score = stored_score,
calculated_score = calculated_score,
difference = difference,
relative_error = difference / math.abs(stored_score)
})
end
end
-- }}}
-- {{{ function ValidationEngine:create_validation_sample
function ValidationEngine:create_validation_sample(similarity_data, embeddings_data)
local all_pairs = {}
-- Extract all similarity pairs from data
for poem_a_id, similarities in pairs(similarity_data) do
for poem_b_id, score in pairs(similarities) do
table.insert(all_pairs, {
poem_a = tonumber(poem_a_id),
poem_b = tonumber(poem_b_id),
stored_score = tonumber(score)
})
end
end
utils.log_info(string.format("Found %d total similarity pairs", #all_pairs))
-- Apply sampling if requested
if self.sample_size and self.sample_size < #all_pairs then
utils.log_info(string.format("Sampling %d pairs for validation", self.sample_size))
-- Random sampling
local sampled_pairs = {}
local used_indices = {}
math.randomseed(os.time())
while #sampled_pairs < self.sample_size do
local random_index = math.random(1, #all_pairs)
if not used_indices[random_index] then
table.insert(sampled_pairs, all_pairs[random_index])
used_indices[random_index] = true
end
end
return sampled_pairs
else
return all_pairs
end
end
-- }}}
Report Generation System
-- {{{ function ValidationEngine:generate_validation_report
function ValidationEngine:generate_validation_report()
local results = self.validation_results
local duration = results.end_time - results.start_time
-- Calculate statistics
local accuracy_rate = results.total_comparisons > 0 and
(results.accurate_scores / results.total_comparisons) or 0
local error_rate = results.total_comparisons > 0 and
(results.inaccurate_scores / results.total_comparisons) or 0
local report = {
algorithm = self.calculator.algorithm,
timestamp = os.date("%Y-%m-%d %H:%M:%S", results.start_time),
duration_seconds = duration,
statistics = {
total_comparisons = results.total_comparisons,
accurate_scores = results.accurate_scores,
inaccurate_scores = results.inaccurate_scores,
missing_embeddings = results.missing_embeddings,
accuracy_rate = accuracy_rate,
error_rate = error_rate,
tolerance = self.tolerance
},
performance = {
comparisons_per_second = duration > 0 and (results.total_comparisons / duration) or 0,
avg_comparison_time_ms = duration > 0 and (duration * 1000 / results.total_comparisons) or 0
},
discrepancies = {
count = #results.discrepancies,
samples = self:get_worst_discrepancies(10),
max_difference = self:get_max_discrepancy(),
avg_difference = self:get_average_discrepancy()
},
errors = {
count = #results.errors,
by_type = self:group_errors_by_type(),
samples = results.errors
},
recommendations = self:generate_recommendations()
}
return report
end
-- }}}
-- {{{ function ValidationEngine:get_worst_discrepancies
function ValidationEngine:get_worst_discrepancies(limit)
local sorted_discrepancies = {}
for _, disc in ipairs(self.validation_results.discrepancies) do
table.insert(sorted_discrepancies, disc)
end
-- Sort by difference (highest first)
table.sort(sorted_discrepancies, function(a, b)
return a.difference > b.difference
end)
local result = {}
for i = 1, math.min(limit, #sorted_discrepancies) do
table.insert(result, sorted_discrepancies[i])
end
return result
end
-- }}}
-- {{{ function ValidationEngine:generate_recommendations
function ValidationEngine:generate_recommendations()
local results = self.validation_results
local recommendations = {}
local accuracy_rate = results.total_comparisons > 0 and
(results.accurate_scores / results.total_comparisons) or 0
if accuracy_rate < 0.95 then
table.insert(recommendations, "Low accuracy rate detected. Consider investigating calculation differences or updating stored similarity data.")
end
if results.missing_embeddings > 0 then
table.insert(recommendations, string.format("%d missing embeddings found. Update embedding data or clean similarity matrix.", results.missing_embeddings))
end
if #results.errors > 0 then
table.insert(recommendations, string.format("%d calculation errors occurred. Check embedding data quality and calculator implementation.", #results.errors))
end
local max_diff = self:get_max_discrepancy()
if max_diff and max_diff > 0.1 then
table.insert(recommendations, string.format("Maximum discrepancy of %.4f detected. Consider tightening tolerance or investigating calculation method.", max_diff))
end
if accuracy_rate > 0.99 then
table.insert(recommendations, "Excellent accuracy rate. Stored similarity data appears reliable.")
end
return recommendations
end
-- }}}
Batch Validation and Reporting
-- {{{ function run_comprehensive_validation
function run_comprehensive_validation(similarity_files, embeddings_files, algorithms, output_dir)
local comprehensive_results = {
timestamp = os.date("%Y-%m-%d %H:%M:%S"),
algorithms_tested = {},
overall_statistics = {
total_files = #similarity_files,
total_algorithms = #algorithms,
successful_validations = 0,
failed_validations = 0
},
file_results = {}
}
for _, algorithm in ipairs(algorithms) do
utils.log_info(string.format("Testing algorithm: %s", algorithm))
local calculator = SimilarityCalculator:new(algorithm, {cache_enabled = true})
local engine = ValidationEngine:new({
tolerance = 0.001,
sample_size = 1000 -- Sample for large datasets
})
engine:set_calculator(calculator)
local algorithm_results = {
algorithm = algorithm,
files_validated = 0,
total_accuracy = 0,
validations = {}
}
for i, similarity_file in ipairs(similarity_files) do
local embeddings_file = embeddings_files[i]
utils.log_info(string.format("Validating file %d/%d with %s", i, #similarity_files, algorithm))
local success, validation_result = pcall(function()
return engine:validate_similarity_matrix(similarity_file, embeddings_file)
end)
if success then
table.insert(algorithm_results.validations, validation_result)
algorithm_results.files_validated = algorithm_results.files_validated + 1
algorithm_results.total_accuracy = algorithm_results.total_accuracy + validation_result.statistics.accuracy_rate
comprehensive_results.overall_statistics.successful_validations = comprehensive_results.overall_statistics.successful_validations + 1
else
utils.log_error(string.format("Validation failed: %s", validation_result))
comprehensive_results.overall_statistics.failed_validations = comprehensive_results.overall_statistics.failed_validations + 1
end
end
-- Calculate average accuracy for algorithm
algorithm_results.average_accuracy = algorithm_results.files_validated > 0 and
(algorithm_results.total_accuracy / algorithm_results.files_validated) or 0
table.insert(comprehensive_results.algorithms_tested, algorithm_results)
end
-- Generate comprehensive report
local report_file = output_dir .. "/validation_comprehensive_report.json"
utils.write_json(report_file, comprehensive_results)
utils.log_info(string.format("Comprehensive validation complete. Report saved: %s", report_file))
return comprehensive_results
end
-- }}}
Quality Assurance Criteria
- Validation engine accurately detects discrepancies in similarity scores
- Support for both full dataset and sampled validation
- Comprehensive error handling and reporting
- Performance suitable for large datasets (6,840+ poems)
- Clear, actionable validation reports
Success Metrics
- Accuracy Detection: Correctly identify >99% of score discrepancies
- Performance: Validate 1000+ similarity pairs per minute
- Sample Validation: Representative sampling maintains >95% accuracy detection
- Error Handling: Graceful handling of missing data and calculation errors
- Report Quality: Clear, actionable recommendations in validation reports
Dependencies
- Issue 010a (modular similarity calculator - required)
- Phase 2 embedding and similarity data files
- JSON data loading utilities
Testing Strategy
- Accuracy Testing: Validate with known correct/incorrect similarity scores
- Performance Testing: Test with large datasets to verify scalability
- Error Handling: Test with corrupted data, missing embeddings
- Sampling Testing: Verify sample validation represents full dataset
- Integration Testing: Test with multiple similarity algorithms
ISSUE STATUS: COMPLETED ✅🔍📊
Priority: High - Critical for ensuring similarity data integrity
Implementation Completed
Files Created:
/src/validation-engine.lua- Core validation framework with comprehensive validation capabilities/src/test-validation-engine.lua- Complete test suite with mock and real data testing/src/run-validation.lua- Command-line interface for running validation operations
Features Implemented:
- ValidationEngine class with configurable tolerance and sampling
- Support for validating stored similarity scores against recalculated values
- Comprehensive error handling for missing embeddings and calculation errors
- Statistical reporting with accuracy rates, discrepancies, and performance metrics
- Multi-algorithm validation support using the modular similarity calculator
- Interactive CLI mode with algorithm selection and file configuration
- Batch validation capabilities for multiple datasets
- Detailed recommendations based on validation results
Testing Results: All 5 validation framework tests pass successfully
- Basic functionality validation ✅
- Mock data validation with artificial similarity pairs ✅
- Real project data validation with sampling ✅
- Multi-algorithm validation across 4 different algorithms ✅
- Error handling with missing calculators and malformed data ✅
Integration: Successfully integrates with Issue 010a (modular similarity calculator) as dependency
Next Steps: Ready for use in Issues 010c (validation reports) and for ongoing similarity data integrity verification