当我使用指令：

mirror of https://www.modelscope.cn/Qwen/QwQ-32B.git synced 2026-03-16 04:57:01 +08:00

```
# We recommend using the tokenizer from base model to avoid long-time and buggy tokenizer conversion.
CUDA_VISIBLE_DEVICES=0,1 \
vllm serve /data/models/ollama-model/QwQ-32B-GGUF/qwq-32b-q4_k_m.gguf \
--tensor-parallel-size 2 \
--port 8132 \
--max-model-len 1024 \
--gpu-memory-utilization 0.7 \
> /data/models/qwq32-q4.log 2>&1
```, 出现了报错：
The tokenizer class you load from this checkpoint is 'LlamaTokenizer'.
The class this function is called from is 'Qwen2TokenizerFast'.

模型检查点（qwq-32b-q4_k_m.gguf）内部分词器配置为LlamaTokenizer，但代码中实际使用的是Qwen2TokenizerFast（通义千问的分词器），为什么是这样？

This commit is contained in:

rudy0053

2025-03-19 02:52:36 +00:00

parent b8bda72841

commit 8245fd1aaf

当我使用指令：

Diff Content Not Available