fix(httpsig): percent-encode non-token keys and body-part names#995
fix(httpsig): percent-encode non-token keys and body-part names#995timothynodes wants to merge 1 commit into
Conversation
Two related fixes for AO-Core keys whose names are not valid HTTP header field names (spaces, emoji, uppercase, etc.). hb_escape: add is_http_token/1 and encode_http_key/1. A key is encoded only when it is not already a valid lowercase HTTP token, so already-valid keys stay human-readable on the wire while everything else becomes a legal header name. Reversed by the existing decode/1. dev_httpsig_conv: generalize encode_ids/1 to percent-encode any non-token key rather than only ID-shaped keys (matched by byte size). Previously a tag named e.g. "my <emoji> tag" was emitted verbatim as an illegal header and rejected downstream, surfacing as a 502 from a fronting proxy. Symmetrically, escape the Content-Disposition `name' in encode_body_part/4 and decode it in from_body_part/3: large (> MAX_HEADER_LENGTH) and/or nested values are lifted into their own body part keyed by their full flat path, which encode_ids/1 never sees, so their raw bytes reached the wire and crashed the structured-field parser on decode. This mirrors the existing percent-encoding of committed key names in dev_httpsig_siginfo. Add round-trip tests for both fixes; existing keys are byte-unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Here's the test description with a concise Expected result: Test: httpsig non-token-key header encoding (502 regression) Purpose Setup
Steps curl -s -o /dev/null -w "%{http_code}\n" "https://hc.timothynode.com/~cache@1.0/read?read=UB33qavPDZkjCQ0Ezs0TqIsJs3XVjPpyibumrKSL7F8" Expected result ┌──────┬─────┬────────┐ Observed ✅ 200 (fixed) vs 502 (unfixed) — matches expectation. |
Summary
Two related fixes for AO-Core message keys whose names are not valid HTTP header field names (spaces, emoji, uppercase, etc.).
hb_escape.erlAdd
is_http_token/1andencode_http_key/1. A key is percent-encoded only when it is not already a valid lowercase HTTP token, so already-valid keys stay byte-identical and human-readable on the wire while everything else becomes a legal header name. The transformation is reversed by the existingdecode/1.dev_httpsig_conv.erlencode_ids/1now percent-encodes any non-token key instead of only ID-shaped keys (which were matched by byte size). Previously a tag named e.g.my <emoji> tagwas emitted verbatim as an illegal header and rejected downstream — observed as a 502 from a fronting proxy.encode_body_part/4/from_body_part/3now symmetrically percent-encode and decode the Content-Dispositionname. Large (> MAX_HEADER_LENGTH) and/or nested values are lifted into their own body part keyed by their full flat path — a nameencode_ids/1never sees — so their raw bytes reached the wire and crashed the structured-field parser on decode (parse_string/2only accepts0x20–0x7E). This mirrors the existing percent-encoding of committed key names indev_httpsig_siginfo. The/path separator is preserved so nested-path splitting still works.Compatibility
Keys/part-names that are already valid (lowercase letters, digits,
-_.and the/separator) are byte-unchanged, so existing signed messages and their IDs are unaffected. Only names that previously could not round-trip at all change form.Tests
Round-trip tests added for both fixes; existing tests still pass.
hb_escape:is_http_token_test,encode_http_key_testdev_httpsig_conv:encode_ids_round_trips_weird_keys_test,encode_body_part_escapes_weird_name_test,encode_large_nested_weird_key_round_trips_test(end-to-end httpsig round-trip of a >4096-byte nested weird-named value)🤖 Generated with Claude Code