Skip to content

Fix CQ recovery gap and stale callback contamination#17734

Open
Caideyipi wants to merge 2 commits into
masterfrom
cq-fix
Open

Fix CQ recovery gap and stale callback contamination#17734
Caideyipi wants to merge 2 commits into
masterfrom
cq-fix

Conversation

@Caideyipi
Copy link
Copy Markdown
Collaborator

Description

This PR fixes two CQ issues in ConfigNode:

  • Recovering an ACTIVE CQ now resubmits the schedule task even if ConfigNode restarts between metadata activation and
    task submission.
  • CQ schedule tokens are now unique per create instance, so stale async callbacks from a dropped CQ cannot affect a
    recreated CQ with the same cqId.

This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

@sonarqubecloud
Copy link
Copy Markdown

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 37.25490% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.58%. Comparing base (7563ac8) to head (6f7a337).

Files with missing lines Patch % Lines
.../apache/iotdb/confignode/manager/cq/CQManager.java 4.34% 22 Missing ⚠️
...onfignode/procedure/impl/cq/CreateCQProcedure.java 64.28% 10 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17734      +/-   ##
============================================
+ Coverage     40.55%   40.58%   +0.03%     
  Complexity     2574     2574              
============================================
  Files          5179     5179              
  Lines        349896   349941      +45     
  Branches      44727    44732       +5     
============================================
+ Hits         141890   142018     +128     
+ Misses       208006   207923      -83     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@luoluoyuyu luoluoyuyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review 总结

locallyScheduledCQs + markCQLocallyScheduled 能有效防止 CN 重启或 procedure 重入时同一 CQ 被重复 submit,drop 时清理 map 也正确。CreateCQProcedure 侧改动需结合完整 diff 再确认 token 生成与 recovery 路径。

整体可合入,建议处理行内关于 markCQLocallyScheduled 边界条件的说明/测试。

locallyScheduledCQs.compute(
cqId,
(ignored, previousMd5) -> {
if (!md5.equals(previousMd5)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previousMd5 == null(首次 schedule)时,!md5.equals(previousMd5) 为 true,会 schedule —— 符合预期。

previousMd5md5 相同(重复 startCQScheduler 或重复 submit)时返回 false 跳过 —— 符合预期。

潜在问题:若同一 cqId 的 CQ 定义被更新(md5 变化),会再次 schedule,但 CQScheduleTask 是否已 cancel?若未 cancel,可能短暂双跑。请确认 CreateCQProcedure / ActiveCQPlan 路径会先 drop 旧 task 或 CQScheduleTask 内部有 md5 校验。

建议:补 IT 或 UT:create CQ → restart scheduler → 仅 1 个 active taskdrop CQ → map 中 entry 清除

}

public void unmarkCQLocallyScheduled(String cqId, String md5) {
locallyScheduledCQs.remove(cqId, md5);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConcurrentMap.remove(cqId, md5) 仅在 value 精确匹配时删除,与 submitSelf 失败时 unmarkCQLocallyScheduled(entry.getCqId(), entry.getMd5()) 配合正确,可避免误删新一次 schedule 写入的 md5。

👍 这是好的并发细节。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants