Skip to content

Bring HVF round-trip floor band into the shim#61

Open
jserv wants to merge 1 commit into
mainfrom
hvf-round-trip
Open

Bring HVF round-trip floor band into the shim#61
jserv wants to merge 1 commit into
mainfrom
hvf-round-trip

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented May 30, 2026

This closes the per-call shim path: a diagnostic counter array so every fast-path bail is attributed, inline getpgid(0) / getsid(0) using two new identity slots, and a 256-byte urandom / getrandom inline copy with 4 KiB ring-wrap split. The bulk copy uses ldp/stp with a tbz-driven 1..15 byte tail; the ring lock uses LSE swpal; the slow-path refill runs arc4random_buf outside the lock; the second AT probe is skipped when buf and buf+len-1 share a host page. PGSID_PUBLISH reads (pgid, sid) under session_lock via a new proc_snapshot_pgsid so a concurrent setsid on a sibling vCPU cannot publish a torn pair.

Measured (Apple M1, 100k iter):

  read(urandom, 1)    133 -> 100 ns  (-25 %)
  read(urandom, 64)   224 -> 163 ns  (-27 %)
  read(urandom, 256)  528 -> 355 ns  (-33 %)
  getrandom(1)        128 ->  95 ns  (-26 %)
  getrandom(256)      534 -> 366 ns  (-31 %)

CB_URANDOM_RING_WRAP is retained for ABI stability but stays at zero now that wrap is served inline; a non-zero reading flags a regression.


Summary by cubic

Moves short getrandom/urandom and getpgid/getsid work into the EL1 shim to avoid HVF round-trips. Adds per-call counters and cuts 1–256 byte RNG calls by ~25–33% on Apple M1.

  • New Features

    • Inline fast paths for read(/dev/urandom) and getrandom up to 256 bytes with a wrap-aware 4 KiB ring copy, LSE swpal lock, slow-path refill outside the lock, and a skipped second AT probe when the buffer stays on one page. getrandom accepts flags 0 and GRND_NONBLOCK; others fall back to the host.
    • Inline getpgid(0)/getsid(0) served from new pgid/sid slots; values are published under session_lock via a new snapshot path, and republished on bootstrap, fork, exec, setsid, and setpgid.
    • Diagnostic counters in the shim for all hit/bail reasons; gated dump on exit via ELFUSE_SHIM_STATS, plus helper APIs to read/print them.
  • Bug Fixes

    • Fixed wrapped ring copies in the inline RNG path by splitting the copy at the 4 KiB boundary; added tests/test-shim-urandom-wrap.c.
    • Kept CB_URANDOM_RING_WRAP for ABI stability; it remains zero and now flags regressions if non-zero.

Written for commit d79de9f. Summary will update on new commits.

Review in cubic

@jserv jserv requested a review from Max042004 May 30, 2026 15:52
cubic-dev-ai[bot]

This comment was marked as resolved.

This closes the per-call shim path: a diagnostic counter array so every
fast-path bail is attributed, inline getpgid(0) / getsid(0) using two
new identity slots, and a 256-byte urandom / getrandom inline copy with
4 KiB ring-wrap split. The bulk copy uses ldp/stp with a tbz-driven 1..15
byte tail; the ring lock uses LSE swpal; the slow-path refill runs
arc4random_buf outside the lock; the second AT probe is skipped when buf
and buf+len-1 share a host page. PGSID_PUBLISH reads (pgid, sid) under
session_lock via a new proc_snapshot_pgsid so a concurrent setsid on a
sibling vCPU cannot publish a torn pair.

Measured (Apple M1, 100k iter):
  read(urandom, 1)    133 -> 100 ns  (-25 %)
  read(urandom, 64)   224 -> 163 ns  (-27 %)
  read(urandom, 256)  528 -> 355 ns  (-33 %)
  getrandom(1)        128 ->  95 ns  (-26 %)
  getrandom(256)      534 -> 366 ns  (-31 %)

CB_URANDOM_RING_WRAP is retained for ABI stability but stays at zero now
that wrap is served inline; a non-zero reading flags a regression.
@jserv jserv force-pushed the hvf-round-trip branch from 945fada to d79de9f Compare May 30, 2026 16:36
Comment thread src/core/shim.S
* RESTORE_GPRS_KEEP_X0 reloads X1..X30 from the saved frame, so x29/x30
* are safe choices on any fast path.
*/
.macro COUNTER_INC byte_off, tmp_addr, tmp_val
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ELFUSE_SHIM_STATS gates only the dump. The COUNTER_INC expansion in shim.S reads no such flag. So even with ELFUSE_SHIM_STATS unset , every getpid/getuid/gettid still incrementing.

Second cost is that every host vCPU thread issues a non-atomic load-add-store to the same counter line causing cache-line ping-pong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants