Skip to content

Improve PowerShell execution and failure backoff for Windows disk metrics collection#17747

Merged
jt2594838 merged 1 commit into
masterfrom
cp-sia
May 27, 2026
Merged

Improve PowerShell execution and failure backoff for Windows disk metrics collection#17747
jt2594838 merged 1 commit into
masterfrom
cp-sia

Conversation

@Caideyipi
Copy link
Copy Markdown
Collaborator

Description

  • Improve the stability of Windows disk metrics collection when invoking PowerShell.
  • Prefer the powershell.exe under SystemRoot / windir to reduce PATH dependency.
  • Add a 5-minute retry backoff on failures and throttle repeated failure logs.
  • Introduce a PowerShell executor abstraction and add unit tests for both the success path and failure backoff case.

This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

@sonarqubecloud
Copy link
Copy Markdown

@jt2594838 jt2594838 merged commit 0cc7e9d into master May 27, 2026
42 of 43 checks passed
@jt2594838 jt2594838 deleted the cp-sia branch May 27, 2026 07:52
Copy link
Copy Markdown
Member

@luoluoyuyu luoluoyuyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review 总结

修复 Windows 上 PowerShell/CIM 报错刷屏并拖垮 metrics 的问题:

  • 解析 System32\...\powershell.exe 绝对路径
  • 失败后退避 5 分钟重试
  • 可注入 PowerShellExecutor 便于单测

建议合入

List<String> result = new ArrayList<>();
List<String> rawOutput = new ArrayList<>();
Process process = null;
if (System.currentTimeMillis() < nextPowerShellRetryTime) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

失败后退避期间直接 return Collections.emptyList(),磁盘指标会短暂缺失而非沿用上次值。

对监控这是可接受的(避免 error 风暴),但建议在 handlePowerShellFailure 的日志中保留 nextPowerShellRetryTime,便于运维确认是「主动退避」而非「磁盘消失」。

非阻塞,当前实现可合入。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants