add mobench support to ProveKit main#429
Conversation
Mobench Benchmark ResultsIOS — iPhone 14 (iOS unknown)
2 iterations · 1 warmup · avg is primary metric IOS — iPhone 16 Pro Max (iOS unknown)
2 iterations · 1 warmup · avg is primary metric Note: memory growth excludes warmup/baseline retained before the measured iteration. IOS — iPhone SE 2022 (iOS unknown)
2 iterations · 1 warmup · avg is primary metric IOS — iPhone 14 (iOS unknown)
2 iterations · 1 warmup · avg is primary metric IOS — iPhone 16 Pro Max (iOS unknown)
2 iterations · 1 warmup · avg is primary metric Note: memory growth excludes warmup/baseline retained before the measured iteration. IOS — iPhone SE 2022 (iOS unknown)
2 iterations · 1 warmup · avg is primary metric IOS — iPhone 14 (iOS unknown)
2 iterations · 1 warmup · avg is primary metric IOS — iPhone 16 Pro Max (iOS unknown)
2 iterations · 1 warmup · avg is primary metric Note: memory growth excludes warmup/baseline retained before the measured iteration. IOS — iPhone SE 2022 (iOS unknown)
2 iterations · 1 warmup · avg is primary metric Ios Sina PlotANDROID — Google Pixel 7 (Android unknown)
2 iterations · 1 warmup · avg is primary metric ANDROID — Samsung Galaxy M32 (Android unknown)
2 iterations · 1 warmup · avg is primary metric ANDROID — Samsung Galaxy S24 (Android unknown)
2 iterations · 1 warmup · avg is primary metric Note: memory growth excludes warmup/baseline retained before the measured iteration. ANDROID — Google Pixel 7 (Android unknown)
1 iterations · 0 warmup · avg is primary metric ANDROID — Samsung Galaxy M32 (Android unknown)
1 iterations · 0 warmup · avg is primary metric ANDROID — Samsung Galaxy S24 (Android unknown)
2 iterations · 1 warmup · avg is primary metric ANDROID — Google Pixel 7 (Android unknown)
2 iterations · 1 warmup · avg is primary metric ANDROID — Samsung Galaxy M32 (Android unknown)
1 iterations · 0 warmup · avg is primary metric ANDROID — Samsung Galaxy S24 (Android unknown)
2 iterations · 1 warmup · avg is primary metric Note: memory growth excludes warmup/baseline retained before the measured iteration. Android Sina PlotPosted by mobench at 2026-05-25 13:13 UTC |
CSP benchmarks
Prover time, peak RSS, peak heap, and verifier time are arithmetic means across the iterations. Peak heap comes from the largest No baseline available yet — deltas will appear once this workflow has produced at least one successful Results
|
| @@ -0,0 +1,24 @@ | |||
| dg1 = [60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 48, 55, 48, 49, 48, 49, 60, 60, 51, 50, 48, 49, 48, 49, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 0, 0] | |||
There was a problem hiding this comment.
Can you keep the creation and usage of artifacts inside the workflow itself? That way we'd not need to duplicate toml files and push compiled JSON circuits as well.
There was a problem hiding this comment.
Done in f08820c4: the compiled JSON fixtures and duplicated Prover.toml files are removed from bench-mobile. CI and the BrowserStack reusable workflow now install Noir, run bench-mobile/scripts/generate-fixtures.sh, and build.rs copies the generated noir-examples/*/target/*.json artifacts into OUT_DIR for embedding.
| @@ -0,0 +1,3 @@ | |||
| fn main() { | |||
| uniffi::uniffi_bindgen_main() | |||
There was a problem hiding this comment.
Can we use our own FFI crate?
There was a problem hiding this comment.
which one is that, I could probably do this, but might need to refactor mobench, can you share the link to the crate?
There was a problem hiding this comment.
There was a problem hiding this comment.
Done in f08820c4: bench-mobile now depends on provekit-ffi and uses a new provekit_ffi::in_process helper for prepare/prove/verify. I kept the tiny uniffi-bindgen binary because mobench still expects a UniFFI bindgen entrypoint for mobile runner generation; the ProveKit FFI crate itself remains the proving integration point.
Can we add mid and high tier devices for Android and iOS as well for benchmarking? |
|
Pushed
I’m waiting for the pushed commit’s GitHub workflows to materialize so the BrowserStack run can execute through repo secrets. |
9a7cd1a to
f08820c
Compare
Latest mobile benchmark numbersSource: Mobile Bench PR Auto run 26368378239 on iOS
Android
Memory growth is the benchmark-reported peak growth above baseline; process peak is the reported process RSS peak. |
|
Closed in favor of the replacement PR using the new
Previous CI / benchmark runs for this PR head:
Latest benchmark numbers were reposted on #450: #450 (comment) |
Summary
provespanAndroid mobench fix note
The previous failing run was https://github.com/worldfnd/provekit/actions/runs/26007468825. The missing Vivo Y21 monolithic cell did not recover a BrowserStack session payload or
summary.json; available artifacts only show the BrowserStack fetch timeout after 7200s for buildc943753d95fd0b34f5775aa0a3bc6ff58cbcc3ca. I grepped the recovered Android artifacts and job log forlowmemorykiller,Process * was killed,oom_reaper,SIGKILL, and abnormal signal text; there was no hit because BrowserStack did not return the killed session logs for that cell.Before/failure memory from that run:
What changed:
profile_phase("prove")wraps only the prover entry point. Previouslyprepared.clone()ran insideprofile_phase("prove"), so clone cost and peak memory were charged to proving.failure.jsonnow records attempts, fetch timeout seconds, build id, and any LMK/OOM/SIGKILL lines recovered from attempt/device logs.After numbers: pending the fresh BrowserStack rerun on this commit.
Validation
cargo fmt --allcargo test -p bench-mobile --libcargo test -p bench-mobile --test examples_smokecargo test -p bench-mobile --test passport_smokecargo check -p provekit-ffi --target aarch64-linux-androidwith NDK 26.1aarch64-linux-android34-clangruby -e 'require "yaml"; YAML.load_file(".github/workflows/mobile-bench-reusable.yml")'git diff --check