Summary
StrictMetricsEvaluator::CanContainNulls and CanContainNaNs incorrectly return false when the null_value_counts / nan_value_counts map is non-empty but does not contain an entry for the queried field. This causes the evaluator to erroneously return kRowsMustMatch, potentially skipping row-level filtering and returning rows that do not satisfy the predicate.
Root Cause
In src/iceberg/expression/strict_metrics_evaluator.cc:
bool CanContainNulls(int32_t id) {
if (data_file_.null_value_counts.empty()) {
return true;
}
auto it = data_file_.null_value_counts.find(id);
return it != data_file_.null_value_counts.cend() && it->second > 0;
// ^^^ when field is missing from map, this evaluates to false
}
The same pattern exists in CanContainNaNs.
Reproduction
auto data_file = std::make_shared<DataFile>();
data_file->record_count = 50;
data_file->value_counts = {{14, 50L}};
data_file->null_value_counts = {{4, 0L}, {5, 0L}}; // field 14 missing
data_file->nan_value_counts = {{8, 0L}}; // field 14 missing
data_file->upper_bounds = {{14, Literal::Double(100.0).Serialize().value()}};
data_file->lower_bounds = {{14, Literal::Double(1.0).Serialize().value()}};
// Evaluating: no_nan_stats < 200.0
// Expected: kRowsMightNotMatch (null count unknown)
// Actual: kRowsMustMatch (incorrectly skips filtering)
Proposed Fix
CanContainNulls: if the field is required per schema, return false; if the field is not found in a non-empty map, return true (conservative).
CanContainNaNs: if the field type is not float/double, return false; if the field is not found in a non-empty map, return true (conservative).
This aligns with Java's StrictMetricsEvaluator.canContainNulls() / canContainNaNs() which return true when the field is missing from the map.
Summary
StrictMetricsEvaluator::CanContainNullsandCanContainNaNsincorrectly returnfalsewhen thenull_value_counts/nan_value_countsmap is non-empty but does not contain an entry for the queried field. This causes the evaluator to erroneously returnkRowsMustMatch, potentially skipping row-level filtering and returning rows that do not satisfy the predicate.Root Cause
In
src/iceberg/expression/strict_metrics_evaluator.cc:The same pattern exists in CanContainNaNs.
Reproduction
Proposed Fix
CanContainNulls: if the field is required per schema, return false; if the field is not found in a non-empty map, return true (conservative).
CanContainNaNs: if the field type is not float/double, return false; if the field is not found in a non-empty map, return true (conservative).
This aligns with Java's StrictMetricsEvaluator.canContainNulls() / canContainNaNs() which return true when the field is missing from the map.