[New Model] Add MiniMind-Omni model support#3796
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Signed-off-by: xRay2016 <1150722393@qq.com>
Signed-off-by: xRay2016 <1150722393@qq.com>
Signed-off-by: xRay2016 <1150722393@qq.com>
Signed-off-by: xRay2016 <1150722393@qq.com>
Signed-off-by: xRay2016 <1150722393@qq.com>
23e957d to
7e1c06e
Compare
linyueqian
left a comment
There was a problem hiding this comment.
Thanks @xRay2016. I ran a hands-on end-to-end evaluation with jingyaogong/minimind-3o (3-stage thinker, talker, code2wav). The architecture is faithful to the official MiniMind-O: QK-norm transformer, neox-style RoPE, tied embeddings, and a real transformers.MimiModel Code2Wav matching eval_omni.py. However it does not complete e2e yet. 7 blocking issues are inline below.
One more that is not tied to a single line: the core thinker forward fails CUDA-graph capture with RuntimeError: Cannot copy between CPU and CUDA tensors during CUDA graph capture. This recurs even with multimodal disabled, so a CPU tensor on the forward path needs to be pinned or kept on device. enforce_eager=True is only a temporary workaround.
Environment note for reproducibility: the test box needed VLLM_USE_FLASHINFER_SAMPLER=0 (no nvcc for the flashinfer sampler JIT). That is environmental, not a PR issue. Full trace logs available on request.
| from vllm_omni.model_executor.models.qwen3_tts.configuration_qwen3_tts import ( | ||
| Qwen3TTSConfig, | ||
| ) | ||
| from vllm_omni.transformers_utils.configs.voxcpm import VoxCPMConfig |
There was a problem hiding this comment.
🔴 [blocking] vllm_omni/transformers_utils/configs/voxcpm.py does not exist, so this import raises ImportError, which silently aborts _register_omni_hf_configs(), and minimind-o is never registered. Lines 57-58 also reference CosyVoice3Config / OmniVoiceConfig, which are never imported. This looks like contamination from an unrelated branch; the block should add only minimind-o.
There was a problem hiding this comment.
This was accidentally brought in during the rebase from main and is unrelated to the MiniMind work in this PR.
I’ll trim the block back so it only registers minimind-o and does not include those unrelated imports/references.
| return bridge[-expected_len:].detach().to(torch.float32) | ||
|
|
||
|
|
||
| def thinker2talker( |
There was a problem hiding this comment.
🔴 [blocking] thinker2talker / talker2code2wav use a stale signature (stage_list, engine_input_source, ...) and read stage_list[id].engine_outputs. Current vLLM-Omni calls stage processors as (source_outputs, prompt, ...) for the non-chunk path, or (transfer_manager, pooling_output, request, is_finished) for the chunk-transfer path, which is the default. On the default path the engine hangs in a loop on thinker2talker() got an unexpected keyword argument 'transfer_manager'. These need to be rewritten against the current API; stage_input_processors/qwen3_omni.py and its *_async_chunk variants are a good reference.
| path = resolve_model_dir(path, "SenseVoice encoder") | ||
|
|
||
| try: | ||
| from funasr import AutoModel |
There was a problem hiding this comment.
🔴 [blocking] funasr (and librosa, soundfile) are used but declared in no requirements file. Because the thinker advertises unbounded audio support, even a text-only request invokes SenseVoice during dummy profiling, so the stage cannot start without funasr (ModuleNotFoundError). Please declare them (the official pins funasr==1.3.1, librosa==0.11.0, soundfile==0.13.1) and consider making this import lazy so text-only serving does not hard-require it.
| return self.language_model.compute_logits(hidden_states) | ||
|
|
||
|
|
||
| def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]) -> set[str]: |
There was a problem hiding this comment.
🔴 [blocking] The frozen SenseVoice/SigLIP2 encoders are loaded via from_pretrained/funasr, not from the main checkpoint, so their parameters are never added to the set this method returns. vLLM's track_weights_loading then raises ValueError: weights not initialized from checkpoint. Add the audio_encoder.* / vision_encoder.* param names to loaded_weights before returning.
| return None | ||
| return model.model.encoder.to(device=self.device, dtype=torch.float32) | ||
|
|
||
| def encode_audio_inputs( |
There was a problem hiding this comment.
🔴 [blocking] With the audio tower built, thinker init fails in this path with RuntimeError: expected scalar type Float but found BFloat16: the frozen SenseVoice encoder runs float32 while the model runs bfloat16. The encoder output and model dtype need to be reconciled.
| return (embeddings,) | ||
| return tuple(embeddings.unbind(0)) | ||
|
|
||
| def embed_multimodal(self, **kwargs: object) -> MultiModalEmbeddings: |
There was a problem hiding this comment.
🔴 [blocking] During vLLM dummy-input profiling, this returns 0 embeddings for 5 dummy mm items: AssertionError: Expected number of multimodal embeddings to match number of input items: 5, but got len(mm_embeddings)=0. The dummy-data path through embed_multimodal must produce embeddings consistent with MiniMindOmniDummyInputsBuilder.
| quant_config=quant_config, | ||
| prefix=prefix, | ||
| ) | ||
| for l in range(self.num_hidden_layers) |
There was a problem hiding this comment.
🟢 [nit] E741 ambiguous variable name l. pre-commit is currently red (this, plus trailing whitespace, ruff-format, and a missing end-of-file newline in pipeline.py). pre-commit run --all-files clears it. There are also a few unused imports (contextlib, io, logging) in this file.
| self.spk_emb_size = spk_emb_size | ||
|
|
||
|
|
||
| class MiniMindOmniConfig(PretrainedConfig): |
There was a problem hiding this comment.
🟡 [important] MiniMindOmniConfig exposes no top-level hidden_size / num_hidden_layers; only text_config carries them. The official OmniConfig(MiniMindConfig) inherits these, so any caller reading hf_config.hidden_size directly will break. Consider mirroring the key fields at the top level.
| from transformers import AutoConfig, PretrainedConfig | ||
|
|
||
|
|
||
| @dataclass |
There was a problem hiding this comment.
🟢 [nit] @dataclass here is effectively a no-op since MiniMindConfig defines __init__ manually, and it adds a misleading generated __eq__/__repr__ onto a PretrainedConfig subclass. Recommend removing the decorator.
|
Quick review noted. CI checks look good. |
Signed-off-by: xRay2016 <1150722393@qq.com>
Signed-off-by: xRay2016 <1150722393@qq.com>
Signed-off-by: xRay2016 <1150722393@qq.com>
Signed-off-by: xRay2016 <1150722393@qq.com>
Signed-off-by: xRay2016 <1150722393@qq.com>
Signed-off-by: xRay2016 <1150722393@qq.com>
Signed-off-by: xRay2016 <1150722393@qq.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
ref #3399
Add initial MiniMind-Omni support, including the three-stage pipeline:
This is a draft PR. The focused model and weight-loading paths are working locally, while full end-to-end
Omni(...)inference is still being debugged.Current Progress
Implemented so far:
AutoConfigregistration.MiniMindOmniForConditionalGenerationstage wrapper.thinker -> talker -> code2wav.thinker2talkertalker2code2wavmodel_type="minimind-o".MiniMindOmniForConditionalGenerationTest Plan
Following
docs/contributing/ci/tests_style.md, tests are organized by scope and placed next to the related source modules when possible.E2E Correctness
thinker -> talker -> code2wavE2E Performance
Measure the same prompt set on native MiniMind-Omni and vLLM-Omni.
Metrics:
Test Result
TODO
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)