Skip to content

fix(neutron): derive vxlan gateway chassis from global HA_Chassis table#2082

Open
skrobul wants to merge 1 commit into
mainfrom
ovn-ha-improvements
Open

fix(neutron): derive vxlan gateway chassis from global HA_Chassis table#2082
skrobul wants to merge 1 commit into
mainfrom
ovn-ha-improvements

Conversation

@skrobul

@skrobul skrobul commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

The previous link_vxlan_network_ha_chassis_group implementation sourced the gateway chassis from the per-router HA_Chassis_Group (neutron-<router_id>), which only exists after the external gateway port is attached to the router. Attaching an internal port before the gateway was wired up left the per-network HCG empty and baremetal ports broken.

Replace the per-router HCG lookup with a scan of the HA_Chassis table. If every record shares the same chassis_name(single-gateway deployment that we currently do), that chassis is used at priority32767`. If the table is empty or contains multiple distinct names the handler logs and exits, preserving the existing safe-fallback behaviour.

Add a short-circuit guard: if the per-network HCG already carries chassis entries, return early to avoid redundant work.

The previous link_vxlan_network_ha_chassis_group implementation sourced
the gateway chassis from the per-router HA_Chassis_Group
(neutron-<router_id>), which only exists after the external gateway port
is attached to the router. Attaching an internal port before the gateway
was wired up left the per-network HCG empty and baremetal ports broken.

Replace the per-router HCG lookup with a scan of the global HA_Chassis
table. If every record shares the same chassis_name (single-gateway
deployment), that chassis is used at priority 32767. If the table is
empty or contains multiple distinct names the handler logs and exits,
preserving the existing safe-fallback behaviour.

Add a short-circuit guard: if the per-network HCG already carries
chassis entries (VLAN/FLAT networks, where neutron's own handler runs
first), return early to avoid redundant work.

Update tests: replace the per-router HCG mock with a db_list_rows mock,
add cases for an empty HA_Chassis table, multiple chassis names, and an
already-populated per-network HCG.
@skrobul

skrobul commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Tested in dev:

2026-06-22 12:24:58.945 10 INFO neutron_understack.routers [None req-97978676-cc53-42cc-8cd8-747e800c4422 a1be84d3bfd460e72405f86cb0c150c871483124b837d61f11a72deae66067af 32e02632f4f04415bab5895d1e7247b7 - - 1f75c3b20fcb41ec924a71be83a5ee94 7f46f53fcb3c4625a343eaa35b5e0d04] Linking unified HCG for network 564af7cf-ff6d-4dd5-87e0-eb04a31dae77 (router 9497d564-91e4-49cf-a5f6-fdb5d7d035d8) with chassis ['a0971e5c-8eb1-4fcc-9549-d08a26a0350b'] and anchoring internal LRP lrp-9b94302b-4f11-4297-83b0-e08d7ab1dbf4
2026-06-22 12:24:59.023 10 INFO neutron.common.ovn.utils [None req-97978676-cc53-42cc-8cd8-747e800c4422 a1be84d3bfd460e72405f86cb0c150c871483124b837d61f11a72deae66067af 32e02632f4f04415bab5895d1e7247b7 - - 1f75c3b20fcb41ec924a71be83a5ee94 7f46f53fcb3c4625a343eaa35b5e0d04] HA Chassis Group neutron-564af7cf-ff6d-4dd5-87e0-eb04a31dae77 synchronized; highest priority chassis a0971e5c-8eb1-4fcc-9549-d08a26a0350b

This is on a brand new subnet and router (works same on existing ones too).
Resulting HCG:

❯ kubectl exec -it ovn-ovsdb-nb-0 -- ovn-nbctl  find HA_Chassis_Group name=neutron-564af7cf-ff6d-4dd5-87e0-eb04a31dae77
Defaulted container "ovsdb" out of: ovsdb, init (init)
_uuid               : d1d14148-94e5-4dbd-b511-db4989854b81
external_ids        : {"neutron:availability_zone_hints"="", "neutron:network_id"="564af7cf-ff6d-4dd5-87e0-eb04a31dae77", "neutron:router_id"="9497d564-91e4-49cf-a5f6-fdb5d7d035d8"}
ha_chassis          : [0dce8dca-9994-490c-a643-4e8ae0a6a5bd]
name                : neutron-564af7cf-ff6d-4dd5-87e0-eb04a31dae77
❯ kubectl exec -it ovn-ovsdb-nb-0 -- ovn-nbctl list HA_Chassis 0dce8dca-9994-490c-a643-4e8ae0a6a5bd
Defaulted container "ovsdb" out of: ovsdb, init (init)
_uuid               : 0dce8dca-9994-490c-a643-4e8ae0a6a5bd
chassis_name        : "a0971e5c-8eb1-4fcc-9549-d08a26a0350b"
external_ids        : {}
priority            : 32767
❯ k exec -it ovn-ovsdb-sb-0 -- ovn-sbctl list Chassis
Defaulted container "ovsdb" out of: ovsdb, init (init)
_uuid               : 35081139-bfc5-4b97-8c63-8ea38964c7ea
encaps              : [a6e1bff8-d7bf-43bc-afa6-e3e9e264c1af]
external_ids        : {ct-no-masked-label="true", datapath-type=system, iface-types="bareudp,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", ovn-bridge-mappings="f20-1-network:br-ex", ovn-chassis-mac-mappings="", ovn-cms-options=enable-chassis-as-gw, ovn-ct-lb-related="true", ovn-enable-lflow-cache="true", ovn-limit-lflow-cache="", ovn-memlimit-lflow-cache-kb="", ovn-monitor-all="false", ovn-trim-limit-lflow-cache="", ovn-trim-timeout-ms="", ovn-trim-wmark-perc-lflow-cache="", port-up-notif="true"}
hostname            : "1327172-hp1"
name                : "a0971e5c-8eb1-4fcc-9549-d08a26a0350b"
nb_cfg              : 0
other_config        : {ct-no-masked-label="true", datapath-type=system, iface-types="bareudp,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", ovn-bridge-mappings="f20-1-network:br-ex", ovn-chassis-mac-mappings="", ovn-cms-options=enable-chassis-as-gw, ovn-ct-lb-related="true", ovn-enable-lflow-cache="true", ovn-limit-lflow-cache="", ovn-memlimit-lflow-cache-kb="", ovn-monitor-all="false", ovn-trim-limit-lflow-cache="", ovn-trim-timeout-ms="", ovn-trim-wmark-perc-lflow-cache="", port-up-notif="true"}
transport_zones     : []
vtep_logical_switches: []

@skrobul skrobul force-pushed the ovn-ha-improvements branch from 0e1e53f to c3f0efe Compare June 22, 2026 12:35
@skrobul skrobul marked this pull request as ready for review June 22, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant