Skip to content

Add bitnet-embeddings-270m model adaptation with F16 and I2_S GGUF conversion#564

Open
isHuangXin wants to merge 2 commits into
microsoft:mainfrom
isHuangXin:dev-bitnet-embedding-270m
Open

Add bitnet-embeddings-270m model adaptation with F16 and I2_S GGUF conversion#564
isHuangXin wants to merge 2 commits into
microsoft:mainfrom
isHuangXin:dev-bitnet-embedding-270m

Conversation

@isHuangXin
Copy link
Copy Markdown

  • Add LLM_ARCH_GEMMA3 in llama.cpp for gemma3_text model type
    (embedding scaling, GELU, post-attn/post-FFN norms, GQA)
  • Add GGUF conversion support for Gemma3-based 270m models
    (SPM tokenizer, RMSNorm w+1 offset, arch-specific tensor mapping)
  • Add tokenizer hash for multilingual-e5-0.6b-260311
  • Add conversion documentation

…nversion

- Add GGUF conversion tool for bitnet-embeddings-0.6b (safetensors -> F16/I2_S GGUF)
- Add Qwen3 architecture support in llama.cpp submodule with per-projection RMSNorm
- Add I2_S ternary quantization (2-bit packed -1/0/+1) for lossless precision
- Add f16 norm weight support for correct embedding inference
- Guard bitnet-lut-kernels.h include with TL1/TL2 preprocessor checks
- Update llama.cpp submodule to dev-bitnet-embedding-0.6b branch
- Document F16 (from multilingual-e5-0.6b) and I2_S (from bitnet-embeddings-0.6b) conversion process
…nversion

- Add LLM_ARCH_GEMMA3 in llama.cpp for gemma3_text model type
  (embedding scaling, GELU, post-attn/post-FFN norms, GQA)
- Add GGUF conversion support for Gemma3-based 270m models
  (SPM tokenizer, RMSNorm w+1 offset, arch-specific tensor mapping)
- Add tokenizer hash for multilingual-e5-0.6b-260311
- Add conversion documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant