mirror of
https://www.modelscope.cn/Qwen/QwQ-32B.git
synced 2025-12-08 21:34:21 +08:00
当我使用指令:
``` # We recommend using the tokenizer from base model to avoid long-time and buggy tokenizer conversion. CUDA_VISIBLE_DEVICES=0,1 \ vllm serve /data/models/ollama-model/QwQ-32B-GGUF/qwq-32b-q4_k_m.gguf \ --tensor-parallel-size 2 \ --port 8132 \ --max-model-len 1024 \ --gpu-memory-utilization 0.7 \ > /data/models/qwq32-q4.log 2>&1 ```, 出现了报错: The tokenizer class you load from this checkpoint is 'LlamaTokenizer'. The class this function is called from is 'Qwen2TokenizerFast'. 模型检查点(qwq-32b-q4_k_m.gguf)内部分词器配置为LlamaTokenizer,但代码中实际使用的是Qwen2TokenizerFast(通义千问的分词器),为什么是这样?
This commit is contained in:
parent
b8bda72841
commit
8245fd1aaf