Fix uuid() accepting {, }, and prefixes inside the string#359
Conversation
Instead of stripping "urn:", "uuid:", "{" and "}" from anywhere in the value before matching.
So strings like "5{}50e8400-..." or an interior "urn:" passed the assertion and the un-normalised string was returned to the caller.
Moved the validated into the regex:
- The prefixes are only accepted at the start
- A leading "{" must be paired with a closing "}" using a PCRE conditional
| // Accepts the plain form (including the nil UUID with all 128 bits | ||
| // set to zero), optionally preceded by "urn:" and/or "uuid:" and | ||
| // optionally wrapped in a matching pair of curly braces. | ||
| if (!\preg_match('/^(?:urn:)?(?:uuid:)?(\{)?[0-9A-Fa-f]{8}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{4}-[0-9A-Fa-f]{12}(?(1)\})$/D', $value)) { |
There was a problem hiding this comment.
I think it would be better to break out these different forms, such that it looks like:
if (\str_starts_with($value, 'urn:uuid:') && \preg_match(...)) {
return $value;
}
if (\str_starts_with($value, 'uuid:') && \preg_match(...)) {
return $value;
}
if (\str_starts_with($value, '{') && \str_ends_with('}') && \preg_match(...)) {
return $value;
}
if (\preg_match(...)) {
return $value;
}
// resolve message, report invalidWhile more verbose, having separate forks for each form makes intent much more clear and provides more meaningful code coverage.
|
|
||
| // The nil UUID is special form of UUID that is specified to have all | ||
| // 128 bits set to zero. | ||
| if ('00000000-0000-0000-0000-000000000000' === $value) { |
There was a problem hiding this comment.
Having this as a separate condition is probably not necessary. The performance difference is incredibly small with PHP 8+
There was a problem hiding this comment.
I don't think I follow, this code exists right now in master and was removed. So are you agreeing that this isn't needed and are pointing it out for the if clauses above?
| @@ -588,7 +588,9 @@ public static function getTests(): array | |||
| ['isNonEmptyMap', [[1, 2, 3]], false], | |||
| ['uuid', ['00000000-0000-0000-0000-000000000000'], true], | |||
| ['uuid', ['urn:ff6f8cb0-c57d-21e1-9b21-0800200c9a66'], true], | |||
There was a problem hiding this comment.
This should actually be removed, it is technically not a valid urn: per RFC 4122/9562, since RFC 2141 defines a URN as urn:<NID>:<NSS>.
Hi,
A hopefully small fix for UUIDs being allowed to contain the prefixes and {} inside the string.
If you'd prefer the regular expression to be multiple and commented inline, I'd be happy to do that as well.
I hope the test cases are self-explanatory regarding the behavioral changes.
Moved the validation into the regex: