Cross-hardware float non-determinism as a root cause of GPU prove errors — and a working solution

I've been analysing recurring GPU prove errors in EZKL (#882, #837) and believe a significant class of them originates from a layer below the proof system itself: IEEE-754 float non-determinism between CPU and GPU hardware.

The problem is structural. When EZKL generates a witness on an x86 CPU and attempts to verify a proof generated on an NVIDIA GPU, the float32 representations of intermediate values differ at the bit level — not because the arithmetic is wrong, but because FMA sub-LSB jitter, NaN payload variants, and signed-zero differences between hardware vendors produce divergent bit patterns for arithmetically equivalent values. The ZK circuit sees different witnesses and the proof fails.

This is not a bug in EZKL. It is a float representation problem that sits underneath any proof system operating on IEEE-754 values.

I built a protocol that solves this at the representation layer. Samayuktam/SPCMP maps every float32 through a bijective, order-preserving transformation into a canonical uint32 space, collapses all NaN payload variants to a single canonical quiet NaN, and eliminates signed-zero contamination before any proof generation occurs. The result is a bit-identical canonical representation regardless of which hardware produced the float.

Audit results (May 2026, TPU v5e hardware): 81 assertions, 0 failures. Cross-architecture: Intel x86, NVIDIA GPU, Google TPU v5e. Models tested: GPT-2 Small, BERT-base, LLaMA-style 32K vocab.

The protocol is under provisional patent. Full audit report and specifications are at: https://swapnopammitra.github.io/Pr1malFrameWork/

If this intersects with what EZKL is working on at the witness generation layer, I am reachable directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-hardware float non-determinism as a root cause of GPU prove errors — and a working solution #1026

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cross-hardware float non-determinism as a root cause of GPU prove errors — and a working solution #1026

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions