fix(offline-diarizer): re-embed zero-vote spans instead of arbitrary cluster-0 tie-break by ComicBit · Pull Request #751 · FluidInference/FluidAudio

ComicBit · 2026-07-03T18:57:15Z

Problem

In OfflineReconstruction.buildSegments, aggregated timeline frames whose per-cluster vote sums are all zero are tie-broken arbitrarily to cluster 0. A frame ends up with zero votes when the active local speaker slot received no embedding in any covering window (assignment −2 everywhere). The result: whole speaker turns silently absorbed into the surrounding speaker's segment.

Reproduced on a 97 s two-speaker fixture: a clean 1.6 s turn (segmentation gaps delimit it correctly at 26.88–28.46 s, verified from raw powerset activations) had zero cluster votes in all six covering windows and was emitted as part of the other speaker's 13 s segment.

Fix

Optional post-pass (OfflineDiarizerConfig.zeroVoteReembed, disabled by default — no behavior change unless opted in):

detect maximal contiguous zero-vote runs (speech-active frames, no cluster votes, bounded by gaps/voted frames) ≥ minDurationSeconds (default 0.4 s)
re-embed each run's exact audio span (zero-padded window, weight mask covering only the span's frames so neighboring audio can't leak in)
assign to the closest speaker centroid by cosine; a failed/NaN embedding falls back to the existing tie-break
the run becomes its own segment when its assigned cluster differs from its neighbors

Results (fixture A/B, community-1 preset, min-segment 0.5)

config	DER	speaker-error
baseline	0.0404	0.0200
+ zeroVoteReembed	0.0216	0.0012

Three real turns recovered (26.9–28.4 s, 72.6–73.7 s, 87.3–87.9 s), all matching ground truth.

Tests

ZeroVoteReembedderTests: 15 cases — run detection (voted/non-speech bounds, min-duration filter, multiple runs, timeline-end), assignment (no-margin win, tie → lowest index, NaN, dimension mismatch), disabled-by-default, and synthetic-frame reconstruction integration (segment split on run boundaries, disabled path never invokes the embedder, nil-embedding parity with disabled).

Pure decision logic lives in ZeroVoteReembedder (no RNG, stable iteration order) so it is testable without CoreML models.

Aggregated timeline frames whose per-cluster vote sums are all zero (the active local speaker slot got assignment -2 in every covering window) were tie-broken arbitrarily to cluster 0, silently absorbing whole speaker turns into the surrounding speaker's segment (e.g. the 26.876-28.455s turn on test_large.wav). Reconstruction now detects maximal contiguous zero-vote runs (speech-active, zero votes across all clusters, >= minDurationSeconds), re-embeds each run's exact audio span via embedSpan, and assigns its frames to the closest speaker centroid regardless of margin -- zero votes means there is no incumbent to defend. The run becomes its own segment on the frame-run boundaries. Failed or NaN embeddings keep the tie-break behavior. New OfflineDiarizerConfig.ZeroVoteReembed sub-config (enabled: false by default for upstream parity, minDurationSeconds: 0.4). Pure decision logic in ZeroVoteReembedder (run detection + assignment) is model-free and unit tested; extraction is injected into buildSegments as a spanEmbedder closure. test_large.wav A/B (min-segment=0.5): DER 0.040 -> 0.022, speaker error 0.020 -> 0.001; the 26.9-28.4s turn now emits as its own speaker segment.

Copilot

Pull request overview

This PR adds an optional (disabled-by-default) post-pass to the offline diarization reconstruction pipeline to handle speech-active frames with zero cluster votes by re-embedding the exact audio span and assigning it to the closest centroid, avoiding the prior arbitrary “cluster 0” tie-break behavior that could absorb short speaker turns into neighboring segments.

Changes:

Introduces OfflineDiarizerConfig.zeroVoteReembed to gate and configure the post-pass (min-duration threshold).
Adds ZeroVoteReembedder pure logic (run detection + centroid assignment) and wires it into OfflineReconstruction.buildSegments via an injected spanEmbedder closure.
Implements OfflineEmbeddingExtractor.embedSpan to embed an exact audio span with masking, and adds comprehensive unit tests covering detection, assignment, config defaults, and reconstruction integration.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
Tests/FluidAudioTests/Diarizer/Offline/ZeroVoteReembedderTests.swift	Adds unit/integration tests for zero-vote run detection, assignment determinism, config validation, and reconstruction behavior with an injected embedder.
Sources/FluidAudio/Diarizer/Offline/Utils/ZeroVoteReembedder.swift	Adds pure run-detection and centroid-assignment logic for the zero-vote re-embed pass.
Sources/FluidAudio/Diarizer/Offline/Utils/OfflineReconstruction.swift	Adds optional `spanEmbedder` parameter and applies the zero-vote re-embed pass before segment accumulation.
Sources/FluidAudio/Diarizer/Offline/Extraction/OfflineEmbeddingExtractor.swift	Adds `embedSpan` to compute an embedding over an exact span using a zero-padded window and span-only weight mask.
Sources/FluidAudio/Diarizer/Offline/Core/OfflineDiarizerTypes.swift	Adds `ZeroVoteReembed` config surface + validation for `minDurationSeconds`.
Sources/FluidAudio/Diarizer/Offline/Core/OfflineDiarizerManager.swift	Provides `spanEmbedder` closure (using models + audioSource) to reconstruction when the feature is enabled.

+        for frame in 0..<frameCount {
+            let isZeroVote =
+                speakerCountPerFrame[frame] > 0
+                && activationSums[frame].allSatisfy { $0 == 0 }


+        try buffer.withUnsafeMutableBufferPointer { pointer in
+            guard let baseAddress = pointer.baseAddress else { return }
+            try audioSource.copySamples(
+                into: baseAddress,
+                offset: startSample,
+                count: spanLength
+            )
+        }


+    /// Used by the short-segment relabel post-pass: the span's samples are placed at the
+    /// start of a zero-padded model window and an all-active weight mask covering only the
+    /// span's frames is applied, so the embedding reflects the span's speaker exclusively
+    /// (neighboring audio never leaks in through the mask).


        let merged = mergeSegments(rawSegments, gapThreshold: gapThreshold)
-        return sanitize(segments: merged)
+        let output = sanitize(segments: merged)
+
+
+        return output


… on buffer failure, doc + style - detectRuns requires exactly one active speaker per frame: an overlap frame with zero votes must keep the existing tie-break rather than be collapsed to a single re-embedded speaker (+ test) - embedSpan throws instead of silently embedding a zero buffer when the span buffer's baseAddress is unavailable - embedSpan doc no longer references only the fork's relabel pass - drop leftover temporary in buildSegments return

… throw (mirror of PR FluidInference#751 review fixes)

Copilot AI review requested due to automatic review settings July 3, 2026 18:57

Copilot started reviewing on behalf of ComicBit July 3, 2026 18:57 View session

Copilot AI reviewed Jul 3, 2026

View reviewed changes

ComicBit added a commit to ComicBit/FluidAudio that referenced this pull request Jul 3, 2026

review(zero-vote): overlap-frame exclusion + embedSpan buffer-failure…

2b26eb2

… throw (mirror of PR FluidInference#751 review fixes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(offline-diarizer): re-embed zero-vote spans instead of arbitrary cluster-0 tie-break#751

fix(offline-diarizer): re-embed zero-vote spans instead of arbitrary cluster-0 tie-break#751
ComicBit wants to merge 2 commits into
FluidInference:mainfrom
ComicBit:fix/zero-vote-reembed

ComicBit commented Jul 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ComicBit commented Jul 3, 2026

Problem

Fix

Results (fixture A/B, community-1 preset, min-segment 0.5)

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants