Skip to content

[server] Add Cluster Health API implementation#3400

Open
swuferhong wants to merge 3 commits into
apache:mainfrom
swuferhong:fluss-server-recovery
Open

[server] Add Cluster Health API implementation#3400
swuferhong wants to merge 3 commits into
apache:mainfrom
swuferhong:fluss-server-recovery

Conversation

@swuferhong
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #3399

  • Add GetClusterHealth RPC to Coordinator that computes cluster health from in-memory state
  • Track inactive leaders in CoordinatorContext (marked inactive on NotifyLeaderAndIsr send,
    marked active on successful response when responding server is still the leader)
  • Handle send failures in CoordinatorRequestBatch by synthesizing error responses to clear
    pending inactive state
  • Add client API Admin.getClusterHealth() with ClusterHealth / ClusterHealthStatus types
  • Add ClusterHealthReadinessCheck CLI tool in fluss-dist (exit 0=GREEN, 1=not ready, 2=API unsupported)
  • Add readiness-check.sh two-step readiness probe script (TCP + Cluster Health API)
    with first-boot detection and grace period for API-unsupported (mixed-version rolling upgrade)
  • Wire tablet-server readiness probe to readiness-check.sh in Helm chart
  • Add documentation for Helm deployment and upgrade guide

Brief change log

Tests

API and Format

Documentation

Copy link
Copy Markdown
Contributor

@loserwang1024 loserwang1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comment.

Comment thread fluss-server/src/main/java/org/apache/fluss/server/RpcServiceBase.java Outdated
Comment thread helm/templates/sts-coordinator.yaml
@swuferhong swuferhong force-pushed the fluss-server-recovery branch from 64f2544 to c23ee95 Compare June 3, 2026 04:15
@swuferhong
Copy link
Copy Markdown
Contributor Author

@loserwang1024 comments addressed. PLTA.

Copy link
Copy Markdown
Contributor

@loserwang1024 loserwang1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@swuferhong swuferhong force-pushed the fluss-server-recovery branch from c23ee95 to c50fbea Compare June 8, 2026 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[server] Support Cluster Health API for safe rolling upgrades

2 participants