```
# We recommend using the tokenizer from base model to avoid long-time and buggy tokenizer conversion.
CUDA_VISIBLE_DEVICES=0,1 \
vllm serve /data/models/ollama-model/QwQ-32B-GGUF/qwq-32b-q4_k_m.gguf \
--tensor-parallel-size 2 \
--port 8132 \
--max-model-len 1024 \
--gpu-memory-utilization 0.7 \
> /data/models/qwq32-q4.log 2>&1
```, 出现了报错:
The tokenizer class you load from this checkpoint is 'LlamaTokenizer'.
The class this function is called from is 'Qwen2TokenizerFast'.
模型检查点(qwq-32b-q4_k_m.gguf)内部分词器配置为LlamaTokenizer,但代码中实际使用的是Qwen2TokenizerFast(通义千问的分词器),为什么是这样?