Skip to content

Commit 158bdef

Browse files
Fix oob bias access for MatMulIntegerToFloat and DynamicQuantizeMatMul (#28499)
### Description Fixes a heap out-of-bounds read vulnerability in `DynamicQuantizeMatMul` and `MatMulIntegerToFloat` where a bias tensor with an incorrect number of elements could cause memory reads beyond the allocated buffer. ## Changes - **`dynamic_quantize_matmul.cc`**: Added element count validation for the bias tensor in both the `ComputeCommon` path and the deferred bias addition path (KleidiAI). - **`matmul_integer_base.h`**: Added element count validation in the KleidiAI pre-pack path, causing fallback to `ComputeCommon` (which then rejects the invalid bias with a clear error). - **Tests**: Added regression tests covering runtime bias mismatch, initializer bias mismatch (KleidiAI fallback), and the generic (non-KleidiAI) path for both operators. ## Why we validate element count, not shape (rank) The validation checks `bias_tensor->Shape().Size() == N` (total element count) rather than enforcing that the bias is strictly 1D. This is intentional for several reasons: 1. **Backward compatibility with existing models.** It's possible that some models may have bias tensors with shape `(1, N)` instead of `(N)`. Enforcing rank == 1 would break these models at runtime. This exact issue occurred with the GroupQueryAttention operator, which required relaxing its shape validation in PR #28259. 2. **Consistent with ONNX standard practice.** Most official ONNX operator schemas (Conv, ConvTranspose, DeformConv, Gemm, LayerNormalization) do *not* validate bias shape in their schema's `TypeAndShapeInferenceFunction`; they only document "1D" in the input description text. `BatchNormalization` is the only exception. 3. **The kernel only needs N contiguous floats.** The compute implementation accesses bias via raw data pointer (`bias->Data<float>()`) and reads exactly `N` elements. It never indexes into specific dimensions or assumes a particular rank. A bias of shape `(N)`, `(1, N)`, or `(1, 1, N)` all work identically. 4. **Schema constraints cannot be relaxed without a version bump.** If we added a strict rank check to the schema now and later discovered models using `(1, N)`, fixing it would probably require a new opset version (though we've never actually bumped the version for contrib ops ...). ## Motivation and Context Without this fix, passing a bias tensor with fewer elements than `B`'s last dimension causes the kernel to read past the end of the bias buffer, potentially exposing sensitive memory contents or causing a crash.
1 parent 84d03c5 commit 158bdef

4 files changed

Lines changed: 153 additions & 2 deletions

File tree

onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,12 @@ Status MatMulIntegerToFloatBase::ComputeCommon(OpKernelContext* ctx,
8080
if (y->Shape().Size() == 0)
8181
return Status::OK();
8282

83+
if (bias_tensor != nullptr) {
84+
ORT_RETURN_IF_NOT(bias_tensor->Shape().Size() == static_cast<int64_t>(helper.N()),
85+
"bias tensor's element count must equal B's last dimension (",
86+
helper.N(), "), but got ", bias_tensor->Shape().Size());
87+
}
88+
8389
auto* y_data = y->MutableData<float>();
8490
const auto* bias_data = bias_tensor != nullptr ? bias_tensor->Data<float>() : nullptr;
8591

@@ -306,8 +312,12 @@ Status DynamicQuantizeMatMul::Compute(OpKernelContext* ctx) const {
306312
// This evaluates to true if bias data was not provided as constant data for prepacking stage
307313
if (!dynamic_quant_mlas_bias_data_was_packed_) {
308314
if (ctx->Input<Tensor>(IN_BIAS) != nullptr) {
309-
const auto biases = std::vector<float>(&ctx->Input<Tensor>(IN_BIAS)->Data<float>()[0],
310-
&ctx->Input<Tensor>(IN_BIAS)->Data<float>()[gemm_shape.N]);
315+
const Tensor* bias_t = ctx->Input<Tensor>(IN_BIAS);
316+
ORT_RETURN_IF_NOT(bias_t->Shape().Size() == static_cast<int64_t>(gemm_shape.N),
317+
"bias tensor's element count must equal B's last dimension (",
318+
gemm_shape.N, "), but got ", bias_t->Shape().Size());
319+
const auto biases = std::vector<float>(&bias_t->Data<float>()[0],
320+
&bias_t->Data<float>()[gemm_shape.N]);
311321

312322
// deferred adding of bias
313323
for (size_t gemm_idx = 0; gemm_idx < num_gemms; gemm_idx++) {

onnxruntime/core/providers/cpu/quantization/matmul_integer_base.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,9 @@ class MatMulIntegerBase : public OpKernel {
208208
}
209209

210210
if (ctx.bias != nullptr) {
211+
if (ctx.bias->Shape().Size() != static_cast<int64_t>(ctx.N)) {
212+
return false;
213+
}
211214
dynamic_quant_mlas_bias_data_was_packed_ = true;
212215
}
213216

onnxruntime/test/contrib_ops/dynamic_quantize_matmul_test.cc

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -421,6 +421,47 @@ TEST(DynamicQuantizeMatMul, KleidiRejectsUnsupportedBShape) {
421421
test.Run();
422422
}
423423

424+
// 6. Mismatched bias (runtime tensor) -> must be rejected at compute time.
425+
TEST(DynamicQuantizeMatMul, KleidiBiasRuntimeShapeMismatch) {
426+
if (!HasArmSME()) GTEST_SKIP();
427+
KleidiDynMatMulData data;
428+
// Bias has only 1 element but N=3 — this must be rejected.
429+
const std::vector<float> bad_bias = {1.0f};
430+
431+
OpTester test("DynamicQuantizeMatMul", 1, kMSDomain);
432+
test.AddInput<float>("A", {data.M, data.K}, data.a);
433+
test.AddInput<int8_t>("B", {data.K, data.N}, data.b, true /*initializer*/);
434+
test.AddInput<float>("b_scale", {data.N}, data.b_scale, true);
435+
test.AddInput<int8_t>("b_zero_point", {data.N}, data.b_zp, true);
436+
test.AddInput<float>("bias", {1}, bad_bias, false /*runtime*/);
437+
test.AddOutput<float>("Y", {data.M, data.N}, std::vector<float>(data.M * data.N, 0.0f));
438+
test.ConfigEp(DefaultCpuExecutionProvider())
439+
.Config(OpTester::ExpectResult::kExpectFailure,
440+
"bias tensor's element count must equal B's last dimension")
441+
.RunWithConfig();
442+
}
443+
444+
// 7. Mismatched bias (constant initializer) -> KleidiAI pre-pack rejects -> falls back to ComputeCommon
445+
// -> rejected
446+
TEST(DynamicQuantizeMatMul, KleidiBiasInitializerShapeMismatch) {
447+
if (!HasArmSME()) GTEST_SKIP();
448+
KleidiDynMatMulData data;
449+
// Bias has only 1 element but N=3 — this must be rejected.
450+
const std::vector<float> bad_bias = {1.0f};
451+
452+
OpTester test("DynamicQuantizeMatMul", 1, kMSDomain);
453+
test.AddInput<float>("A", {data.M, data.K}, data.a);
454+
test.AddInput<int8_t>("B", {data.K, data.N}, data.b, true /*initializer*/);
455+
test.AddInput<float>("b_scale", {data.N}, data.b_scale, true);
456+
test.AddInput<int8_t>("b_zero_point", {data.N}, data.b_zp, true);
457+
test.AddInput<float>("bias", {1}, bad_bias, true /*initializer*/);
458+
test.AddOutput<float>("Y", {data.M, data.N}, std::vector<float>(data.M * data.N, 0.0f));
459+
test.ConfigEp(DefaultCpuExecutionProvider())
460+
.Config(OpTester::ExpectResult::kExpectFailure,
461+
"bias tensor's element count must equal B's last dimension")
462+
.RunWithConfig();
463+
}
464+
424465
#endif // USE_KLEIDIAI
425466

426467
TEST(DynamicQuantizeMatMul, B_PerColumn_ND) {
@@ -486,5 +527,35 @@ TEST(DynamicQuantizeMatMul, B_PerColumn_ND) {
486527
test_case({15, 14, 13}, {15, 13, 27}, {15, 1, 27});
487528
}
488529

530+
// Test that a bias tensor with length mismatched to B's last dimension is rejected.
531+
// This reproduces a heap OOB read when bias is shorter than N.
532+
TEST(DynamicQuantizeMatMul, BiasShapeMismatch) {
533+
constexpr int64_t M = 2;
534+
constexpr int64_t K = 4;
535+
constexpr int64_t N = 8;
536+
537+
std::vector<float> A_data(M * K, 1.0f);
538+
std::vector<uint8_t> B_data(K * N, 128);
539+
std::vector<float> B_scale = {0.5f};
540+
std::vector<uint8_t> B_zero_point = {128};
541+
542+
// Bias has only 1 element but N=8 — this must be rejected.
543+
std::vector<float> bad_bias = {1.0f};
544+
545+
OpTester test("DynamicQuantizeMatMul", 1, onnxruntime::kMSDomain);
546+
test.AddInput<float>("A", {M, K}, A_data);
547+
test.AddInput<uint8_t>("B", {K, N}, B_data);
548+
test.AddInput<float>("b_scale", {1}, B_scale);
549+
test.AddInput<uint8_t>("b_zero_point", {1}, B_zero_point);
550+
test.AddInput<float>("bias", {1}, bad_bias);
551+
552+
test.AddOutput<float>("Y", {M, N}, std::vector<float>(M * N, 0.0f));
553+
554+
test.ConfigEp(DefaultCpuExecutionProvider())
555+
.Config(OpTester::ExpectResult::kExpectFailure,
556+
"bias tensor's element count must equal B's last dimension")
557+
.RunWithConfig();
558+
}
559+
489560
} // namespace test
490561
} // namespace onnxruntime

onnxruntime/test/contrib_ops/matmul_integer_to_float_test.cc

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -489,5 +489,72 @@ TEST(MatMulIntegerToFloat, MatMulInteger_With_ZeroPoint) {
489489
test_case({15, 14, 13}, {15, 13, 27}, {15, 1, 27});
490490
}
491491

492+
// Test that a bias tensor with length mismatched to B's last dimension is rejected.
493+
// This reproduces a heap OOB read when bias is shorter than N.
494+
TEST(MatMulIntegerToFloat, BiasShapeMismatch) {
495+
constexpr int64_t M = 2;
496+
constexpr int64_t K = 4;
497+
constexpr int64_t N = 8;
498+
499+
std::vector<uint8_t> A_data(M * K, 128);
500+
std::vector<uint8_t> B_data(K * N, 128);
501+
std::vector<float> A_scale = {0.5f};
502+
std::vector<float> B_scale = {0.5f};
503+
std::vector<uint8_t> A_zero_point = {128};
504+
std::vector<uint8_t> B_zero_point = {128};
505+
506+
// Bias has only 1 element but N=8. This must be rejected.
507+
std::vector<float> bad_bias = {1.0f};
508+
509+
OpTester test("MatMulIntegerToFloat", 1, onnxruntime::kMSDomain);
510+
test.AddInput<uint8_t>("A", {M, K}, A_data);
511+
test.AddInput<uint8_t>("B", {K, N}, B_data);
512+
test.AddInput<float>("a_scale", {1}, A_scale);
513+
test.AddInput<float>("b_scale", {1}, B_scale);
514+
test.AddInput<uint8_t>("a_zero_point", {1}, A_zero_point);
515+
test.AddInput<uint8_t>("b_zero_point", {1}, B_zero_point);
516+
test.AddInput<float>("bias", {1}, bad_bias);
517+
518+
test.AddOutput<float>("Y", {M, N}, std::vector<float>(M * N, 0.0f));
519+
520+
test.ConfigEp(DefaultCpuExecutionProvider())
521+
.Config(OpTester::ExpectResult::kExpectFailure,
522+
"bias tensor's element count must equal B's last dimension")
523+
.RunWithConfig();
524+
}
525+
526+
// Test that a bias tensor with length larger than B's last dimension is rejected.
527+
TEST(MatMulIntegerToFloat, BiasShapeMismatch_LargerBias) {
528+
constexpr int64_t M = 2;
529+
constexpr int64_t K = 4;
530+
constexpr int64_t N = 8;
531+
532+
std::vector<uint8_t> A_data(M * K, 128);
533+
std::vector<uint8_t> B_data(K * N, 128);
534+
std::vector<float> A_scale = {0.5f};
535+
std::vector<float> B_scale = {0.5f};
536+
std::vector<uint8_t> A_zero_point = {128};
537+
std::vector<uint8_t> B_zero_point = {128};
538+
539+
// Bias has length > N, which must be rejected.
540+
std::vector<float> bad_bias(static_cast<size_t>(N + 1), 1.0f);
541+
542+
OpTester test("MatMulIntegerToFloat", 1, onnxruntime::kMSDomain);
543+
test.AddInput<uint8_t>("A", {M, K}, A_data);
544+
test.AddInput<uint8_t>("B", {K, N}, B_data);
545+
test.AddInput<float>("a_scale", {1}, A_scale);
546+
test.AddInput<float>("b_scale", {1}, B_scale);
547+
test.AddInput<uint8_t>("a_zero_point", {1}, A_zero_point);
548+
test.AddInput<uint8_t>("b_zero_point", {1}, B_zero_point);
549+
test.AddInput<float>("bias", {N + 1}, bad_bias);
550+
551+
test.AddOutput<float>("Y", {M, N}, std::vector<float>(M * N, 0.0f));
552+
553+
test.ConfigEp(DefaultCpuExecutionProvider())
554+
.Config(OpTester::ExpectResult::kExpectFailure,
555+
"bias tensor's element count must equal B's last dimension")
556+
.RunWithConfig();
557+
}
558+
492559
} // namespace test
493560
} // namespace onnxruntime

0 commit comments

Comments
 (0)