Skip to content

Fix path/branch extraction removing every /blob/ and /tree/ marker, not just the leading one (fixes #150)#152

Merged
protoroto merged 1 commit into
nephila:masterfrom
patchwright:bugfix/issue-150-fix-blob-tree-prefix-strip
Jun 16, 2026
Merged

Fix path/branch extraction removing every /blob/ and /tree/ marker, not just the leading one (fixes #150)#152
protoroto merged 1 commit into
nephila:masterfrom
patchwright:bugfix/issue-150-fix-blob-tree-prefix-strip

Conversation

@patchwright

Copy link
Copy Markdown
Contributor

Fixes #150.

Re-opened from #149 with the branch renamed to bugfix/issue-150-fix-blob-tree-prefix-strip so the towncrier branch-name check passes (per @protoroto's request). Changelog fragment renamed to changes/150.bugfix. Same commit and diff as #149 — no code changes.


Description

path and branch are extracted from the matched URL by removing the leading
/blob/, /tree/, /-/blob/ or /-/tree/ marker. The code did this with
str.replace(marker, ""), which removes every occurrence of the marker, not
just the leading one. When the file path or branch name legitimately contains the
same segment again, the result is silently corrupted.

For example:

>>> import giturlparse
>>> giturlparse.parse("https://github.com/owner/repo/blob/main/src/blob/utils.py").path
'main/srcutils.py'        # expected: 'main/src/blob/utils.py'
>>> giturlparse.parse("https://github.com/owner/repo/tree/feature/tree/x").branch
'featurex'                # expected: 'feature/tree/x'

The leading marker is already guaranteed by the preceding startswith(...) check,
so the fix is to slice off exactly that prefix instead of calling replace:

data["path"] = data["path_raw"][len("/blob/"):]

Slicing is used (rather than str.removeprefix) to remain compatible with the
declared python_requires = >=3.8.

Fixed in both the GitHub and GitLab platforms.

References

No existing issue.

Checklist

  • Code lint checked via inv lint (ruff, black, isort all clean on the changed files)
  • Tests added (regression cases for nested /blob/ paths and /tree/ branch names on both GitHub and GitLab; verified to fail on the previous code and pass with the fix)
  • Changelog fragment added under changes/

GitHubPlatform and GitLabPlatform built `path`/`branch` with
str.replace(marker, ""), which removes *every* occurrence of the
marker rather than only the leading one. A URL whose file path or
branch name contained the same segment again was silently corrupted:

    parse('.../blob/main/src/blob/utils.py').path  -> 'main/srcutils.py'
    parse('.../tree/feature/tree/x').branch        -> 'featurex'

The leading marker is already guaranteed by the preceding startswith
check, so slice it off instead. Slicing (rather than str.removeprefix)
keeps the declared Python 3.8 compatibility. Adds regression tests for
nested /blob/ paths and /tree/ branch names on both GitHub and GitLab.
@protoroto protoroto merged commit cf24925 into nephila:master Jun 16, 2026
17 of 28 checks passed
@protoroto

Copy link
Copy Markdown
Member

@patchwright Towncrier kept complaining (somehow he was still trying with the old branch name)... anyway, I've merged and released 0.15.0 version :) thanks again for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Path/branch extraction removes every /blob/ and /tree/ marker, not just the leading one

2 participants