Skip to content

fix: support Graviton (aarch64) bare metal instance types#75

Open
fonta-rh wants to merge 2 commits into
openshift-eng:mainfrom
fonta-rh:ocpedge-2608-graviton
Open

fix: support Graviton (aarch64) bare metal instance types#75
fonta-rh wants to merge 2 commits into
openshift-eng:mainfrom
fonta-rh:ocpedge-2608-graviton

Conversation

@fonta-rh

@fonta-rh fonta-rh commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Broaden metal-detection regex from c[0-9]+[a-z]*.metal (c-family only) to \.metal$, covering all bare metal instance types including Graviton (c7g.metal, m7g.metal, r7g.metal)
  • Fix AMI auto-detect for aarch64: add get_ami_arch() to map aarch64arm64 (AWS uses arm64 in AMI names), and remove *GA* filter that excluded all arm64 AMIs (and limited x86_64 to an older GA build)
  • Add Graviton instance type hints to instance.env.template
  • Add Graviton usage documentation to deploy/aws-hypervisor/README.md (configuration, Metal3 image overrides, known limitations)
  • Document build-metal3-arm64.sh in helpers/README.md

Depends on #73 (branched from openshift-clusters-common-sh).

Test plan

  • make shellcheck passes
  • AMI auto-detect returns correct AMI for RHEL_HOST_ARCHITECTURE=x86_64
  • AMI auto-detect returns correct AMI for RHEL_HOST_ARCHITECTURE=aarch64
  • Metal regex matches c5n.metal, c7g.metal, m7g.metal, r7g.metal, i3.metal
  • Metal regex rejects c5n.xlarge, m5.2xlarge
  • make create + make destroy on a Graviton instance type (e.g. c7g.metal)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved AMI auto-detection to normalize aarch64/Graviton architecture for more accurate image selection.
    • Enhanced detection of Graviton/bare-metal instance types by recognizing additional .metal naming patterns.
  • Documentation

    • Updated deployment guidance for aarch64/Graviton, including clarified defaults for AMI auto-detection.
    • Added/expanded “Metal3 Image Overrides” documentation for arm64/aarch64 image rebuilds and example environment variables.
    • Documented an arm64 Metal3 image builder tool and its usage/options.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 4, 2026
@openshift-ci

openshift-ci Bot commented Jun 4, 2026

Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Walkthrough

This PR extends AWS hypervisor and OpenShift deployment infrastructure to support aarch64/Graviton instances. Configuration templates, AMI filtering logic, metal instance detection, and Metal3 image documentation are updated to enable deployment targeting aarch64 architecture alongside existing x86_64 support.

Changes

aarch64 Graviton Instance Support

Layer / File(s) Summary
Configuration template documentation for aarch64
deploy/aws-hypervisor/instance.env.template
Updated comments on RHEL_HOST_ARCHITECTURE, EC2_INSTANCE_TYPE, and RHEL_HOST_AMI to document aarch64 as a supported option and clarify that leaving RHEL_HOST_AMI empty will auto-detect the latest RHEL AMI for the configured host architecture.
AMI architecture mapping and metal instance detection
deploy/aws-hypervisor/scripts/common.sh, deploy/aws-hypervisor/scripts/create.sh
Added get_ami_arch() function to translate RHEL_HOST_ARCHITECTURE values (mapping aarch64arm64). Updated aws_ec2_describe_images() to derive ami_arch and use it in the AMI name filter pattern. Broadened metal instance type detection from c[0-9]+[a-z]*.metal to \.metal$ to treat any metal instance type as requiring MetalMachine.
Metal3 image overrides and Graviton support documentation
deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_fencing_example.sh, deploy/aws-hypervisor/README.md
Added commented-out Metal3 image override documentation with rebuild guidance and export examples. Introduced comprehensive Graviton section in AWS README covering supported instance types, required configuration, Metal3 image override patterns with shell conditional example, and aarch64-specific limitations (IPv6 IPI unsupported, explicit release image required for CI auto-discovery).
Helper tool documentation
helpers/README.md
Documented arm64 Metal3 Image Builder tool, including build purpose for metal3-io/ironic-image variants, usage examples, CLI options, prerequisites, and rebuild cadence.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • openshift-eng/two-node-toolbox#77: The main PR's Graviton/Metal3 documentation (including the IRONIC_IMAGE, VBMC_IMAGE, and SUSHY_TOOLS_IMAGE override guidance and references to the rebuild helper) directly aligns with and supports the arm64 Metal3 image builder added in the retrieved PR.

Suggested reviewers

  • eggfoobar
  • jerpeter1
🚥 Pre-merge checks | ✅ 9 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Ai-Attribution ⚠️ Warning AI tool (Claude Code) was used and documented in PR, but commit 91d612d uses Co-Authored-By trailer for AI instead of required Assisted-by or Generated-by. Update commit 91d612d to use Generated-by or Assisted-by trailer instead of Co-Authored-By for Claude Opus attribution.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding support for Graviton (aarch64) bare metal instances across the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No-Weak-Crypto ✅ Passed The PR introduces no weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons. Changes are infrastructure-as-...
Container-Privileges ✅ Passed PR contains no container or Kubernetes manifests; only shell scripts and documentation files for AWS hypervisor deployment.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data (passwords, tokens, API keys, PII, session IDs, etc.) is logged in this PR. Logged items are public AWS resource identifiers (AMI IDs, instance IDs, public IPs) and configuration...
No-Hardcoded-Secrets ✅ Passed No hardcoded secrets detected. All assigned values are non-sensitive configuration parameters (region names, instance types, profile names) or paths to public keys. Placeholder strings use clear ma...
No-Injection-Vectors ✅ Passed No SQL concatenation, shell=True with user input, eval/exec on untrusted data, pickle.loads, unsafe yaml.load, os.system with variables, or dangerouslySetInnerHTML found. AWS CLI variables properly...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 4, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fonta-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2026
@fonta-rh fonta-rh force-pushed the ocpedge-2608-graviton branch from 45b4fa3 to 7be2b57 Compare June 9, 2026 09:15
@fonta-rh

Copy link
Copy Markdown
Contributor Author

/hold depends on dev-scripts and metal3-dev-env PRs

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2026
@fonta-rh fonta-rh marked this pull request as ready for review June 12, 2026 15:41
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2026
@openshift-ci openshift-ci Bot requested review from clobrano and qJkee June 12, 2026 15:41

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_fencing_example.sh`:
- Around line 32-38: The arm64 Metal3 override block in
config_fencing_example.sh incorrectly suggests setting IRONIC_IMAGE, VBMC_IMAGE,
SUSHY_TOOLS_IMAGE and referencing helpers/build-metal3-arm64.sh; update this
example to match upstream dev-scripts by removing those commented env vars and
helper reference and instead instruct users to override OPENSHIFT_RELEASE_IMAGE
for arm64 (or reference the correct upstream helper/variable if you prefer that
upstream alternative). Locate the aarch64 block (symbols: IRONIC_IMAGE,
VBMC_IMAGE, SUSHY_TOOLS_IMAGE, helpers/build-metal3-arm64.sh,
OPENSHIFT_RELEASE_IMAGE) and replace the content with a concise comment that
mirrors upstream guidance: "Do not use for arm64; override
OPENSHIFT_RELEASE_IMAGE" (or point to the proper upstream helper) so the example
is accurate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ae8a62bc-f711-4514-b5a3-8b32386d4132

📥 Commits

Reviewing files that changed from the base of the PR and between 88ef77a and 7be2b57.

📒 Files selected for processing (4)
  • deploy/aws-hypervisor/instance.env.template
  • deploy/aws-hypervisor/scripts/common.sh
  • deploy/aws-hypervisor/scripts/create.sh
  • deploy/openshift-clusters/roles/dev-scripts/install-dev/files/config_fencing_example.sh

@openshift-ci openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 18, 2026
- Broaden metal-detection regex from c-family-only to any .metal suffix,
  covering Graviton types like c7g.metal, m7g.metal, r7g.metal
- Fix AMI auto-detect for aarch64: map aarch64 → arm64 to match AWS
  naming convention, and drop GA-only filter that excluded arm64 AMIs
- Add Graviton hints to instance.env.template
- Add commented-out aarch64 Metal3 image overrides to fencing example config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fonta-rh fonta-rh force-pushed the ocpedge-2608-graviton branch from 7be2b57 to 1b7fea1 Compare June 18, 2026 09:23
@lucaconsalvi

Copy link
Copy Markdown
Contributor

Thanks for the PR, Pablo.

Code review

Found 2 issues:

  1. AMI filter no longer restricts to GA images — removing *GA* from the filter means the query (sort_by CreationDate descending, take [0]) may silently select a pre-release (beta/RC/alpha) RHEL AMI if one was published more recently than the GA image. This is a regression for x86_64 as well as aarch64.

aws ec2 describe-images \
--query 'reverse(sort_by(Images, &CreationDate))[].[Name, ImageId, CreationDate]' \
--filters "Name=name,Values=RHEL-${RHEL_VERSION}*${ami_arch}*" \
--region "${REGION}" \
--owners amazon \
--output json \

  1. get_ami_arch() passes unrecognized RHEL_HOST_ARCHITECTURE values through silently — only aarch64 → arm64 is handled; a typo or future architecture falls through unchanged with no validation or error, producing a broken AMI filter and a silent deployment failure.

function get_ami_arch() {
local arch="${RHEL_HOST_ARCHITECTURE}"
if [[ "${arch}" == "aarch64" ]]; then
arch="arm64"
fi
echo "${arch}"
}

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@fonta-rh

Copy link
Copy Markdown
Contributor Author

Thanks for looking at this, Luca. Some context on both findings:

1. AMI filter — *GA* removal was intentional

The old filter RHEL-${RHEL_VERSION}.*GA*${RHEL_HOST_ARCHITECTURE}* had two problems:

  1. aarch64 naming mismatch — AWS AMI names use arm64, not aarch64. The old filter returned 0 results for Graviton. The get_ami_arch() mapping fixes this.

  2. *GA* pins to stale AMIs — newer RHEL 9.6 AMIs dropped _GA from the name entirely:

    RHEL-9.6.0_HVM_GA-20250423-arm64-...    (initial GA — has "GA")
    RHEL-9.6.0_HVM-20260521-arm64-...        (newer — NO "GA")
    

    The *GA* filter excluded updated AMIs on both architectures, not just aarch64. Removing it picks up the latest security-patched images.

The concern about pre-release AMIs matching is negated by --owners amazon — Amazon doesn't publish beta/RC RHEL AMIs; those only appear on Red Hat's cloud console.

2. get_ami_arch() passthrough

Only two values are supported and documented: x86_64 (default, common.sh:10) and aarch64 (documented in instance.env.template). A typo produces an empty AMI list → hard error at create.sh:61-63. Adding validation for a dev tool config with 2 supported values and a clear failure mode is unnecessary defensive programming.

@lucaconsalvi

Copy link
Copy Markdown
Contributor

Both explanations hold up:

  1. --owners amazon rules out pre-release AMIs, and the _GA drop in newer RHEL 9.6 AMIs makes the old filter actively harmful for patched images. The removal is correct.
  2. Agreed — two documented values and a hard error on empty results is sufficient for a dev tool. No need for defensive validation here.

Thanks for the context. LGTM.

@lucaconsalvi

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2026
Document Graviton configuration in deploy/aws-hypervisor/README.md:
instance.env settings, Metal3 image overrides, and known limitations.
Add build-metal3-arm64.sh documentation to helpers/README.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
deploy/aws-hypervisor/README.md (1)

121-121: ⚡ Quick win

Remove marketing language about cost savings; focus on technical capability.

The cost comparison (~40% cheaper, hourly rates $2.32/hr vs $3.89/hr) is marketing language that violates the README guidelines requiring "technical and concise" documentation. Replace with straightforward technical statement of support for the instance types. Additionally, use the hyphenated form "bare-metal instances" when the compound adjective modifies the noun.

🛠️ Proposed update
-This toolbox supports AWS Graviton bare metal instances (`c7g.metal`, `m7g.metal`, `r7g.metal`) as hypervisors. Graviton instances are ~40% cheaper than equivalent x86_64 instances (e.g., `c7g.metal` at $2.32/hr vs `c5n.metal` at $3.89/hr).
+This toolbox supports AWS Graviton bare-metal instances (`c7g.metal`, `m7g.metal`, `r7g.metal`) as hypervisors.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/aws-hypervisor/README.md` at line 121, Remove the marketing language
from the sentence that discusses AWS Graviton bare metal instances, specifically
the cost comparison metrics (the percentage savings and hourly rate prices for
c7g.metal vs c5n.metal). Replace this section with a straightforward technical
statement that simply lists the supported Graviton instance types (c7g.metal,
m7g.metal, r7g.metal) without the cost justification. Additionally, change "bare
metal instances" to "bare-metal instances" to use the properly hyphenated
compound adjective form when modifying the noun.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@deploy/aws-hypervisor/README.md`:
- Line 121: Remove the marketing language from the sentence that discusses AWS
Graviton bare metal instances, specifically the cost comparison metrics (the
percentage savings and hourly rate prices for c7g.metal vs c5n.metal). Replace
this section with a straightforward technical statement that simply lists the
supported Graviton instance types (c7g.metal, m7g.metal, r7g.metal) without the
cost justification. Additionally, change "bare metal instances" to "bare-metal
instances" to use the properly hyphenated compound adjective form when modifying
the noun.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 79dccdc0-37f9-44f7-8cb8-47b483b2fb71

📥 Commits

Reviewing files that changed from the base of the PR and between 1b7fea1 and 91d612d.

📒 Files selected for processing (2)
  • deploy/aws-hypervisor/README.md
  • helpers/README.md
✅ Files skipped from review due to trivial changes (1)
  • helpers/README.md

@lucaconsalvi

Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants