Qwen3.5-9B-MTP-GGUF (unsloth)
Qwen3.5-9B 多模态模型,MTP 投机解码,本地快速运行
- 部署
-
- py git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp && cmake -B build -DGGML_CUDA=ON && cmake --build build --target llama-server
- py ./llama.cpp/build/bin/llama-server -hf unsloth/Qwen3.5-9B-MTP-GGUF:UD-Q4_K_XL -ngl 99 -fa on --spec-type draft-mtp