198B MoE视觉模型GGUF量化,适合128GB统一内存本地推理
- 部署
-
- py git clone https://github.com/stepfun-ai/llama.cpp && cd llama.cpp && git checkout step3.7 && cmake -B build && cmake --build build
- py ./llama-server -m Step-3.7-flash-Q4_K_S.gguf --mmproj mmproj-Step-3.7-flash-f16.gguf -ngl 99 -c 32768