[26.04_linux-nvidia-bos] NVIDIA: VR: SAUCE: cxl: guard unlinked memdev endpoints#482
Open
nirmoy wants to merge 1 commit into
Open
Conversation
cxlmd->endpoint starts as ERR_PTR(-ENXIO) until endpoint port registration links the memdev to a real cxl_port. Treat NULL and error pointers as "endpoint not linked" before dereferencing cxlmd->endpoint in CXL helper paths. The BOS region-management backport exposes these helpers before endpoint linkage. This backports commit aff4cce ("NVIDIA: VR: SAUCE: cxl: Guard unlinked memdev endpoints"). Its PCI hunk is omitted because BOS already guards cxl_reset_done(). Fixes: 29317f8 ("cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation") Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Contributor
PR Validation ReportPatchscan ✅ No Missing FixesAll cherry-picked commits checked — no missing upstream fixes found. PR Lint ✅ All checks passedDetailsChecking 1 commits... Cherry-pick digest: ┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐ │ Local │ Referenced upstream / Patch subject │ Patch-ID │ Subject │ SoB chain │ ├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤ │ 2c512ff1698a │ [SAUCE] cxl: guard unlinked memdev endpoints │ N/A │ N/A │ nirmoyd │ └──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘ Lint: all checks passed. |
Collaborator
Author
BaseOS Kernel ReviewSummaryNo issues found across the reviewed commits. Findings: no problems found Latest watcher review: open review Generated test plan: open test plan Kernel deb build: successful (download debs, 4 files) Head: This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review. |
Collaborator
Author
Strata/Vera boot smoke (2026-07-03)
|
Collaborator
|
This looks like a fix that can go upstream, is there any plan to post this to the LKML? Other than that, LGTM. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
NULLand error-valuedcxlmd->endpointpointers as unlinked before entering CXL HDM and region helper paths.hdm.candregion.cchanges fromaff4ccee4530onto BOS/Resolute.drivers/cxl/pci.chunk because this branch already guardscxl_reset_done()against bothNULLandERR_PTRendpoints.Root cause
cxlmd->endpointstarts asERR_PTR(-ENXIO)until endpoint-port registration links the memdev to a realcxl_port. The affected helpers checked only forNULL, allowing early CXL consumers to pass the error pointer into functions such asdevice_find_child().The BOS region-management backport exposes these helpers before endpoint linkage, producing the observed NVIDIA probe failure and boot delay.
Validation
git diff --check: pass.checkpatch.pl: 0 errors, 0 warnings.aff4ccee4530.CONFIG_CXL_BUS,CONFIG_CXL_MEM,CONFIG_CXL_PCI, andCONFIG_CXL_REGIONare enabled in the generated amd64 BOS configuration.drivers/cxl/core/hdm.o,region.o, anddrivers/cxl/core/built-in.a: pass.Tracking