From 8245fd1aafe4f415d891ee9ca74d9407c3950a3f Mon Sep 17 00:00:00 2001
From: rudy0053 <rudy0053@163.com>
Date: Wed, 19 Mar 2025 02:52:36 +0000
Subject: [PATCH] =?UTF-8?q?=E5=BD=93=E6=88=91=E4=BD=BF=E7=94=A8=E6=8C=87?=
 =?UTF-8?q?=E4=BB=A4=EF=BC=9A=20```=20#=20We=20recommend=20using=20the=20t?=
 =?UTF-8?q?okenizer=20from=20base=20model=20to=20avoid=20long-time=20and?=
 =?UTF-8?q?=20buggy=20tokenizer=20conversion.=20CUDA=5FVISIBLE=5FDEVICES?=
 =?UTF-8?q?=3D0,1=20\=20vllm=20serve=20/data/models/ollama-model/QwQ-32B-G?=
 =?UTF-8?q?GUF/qwq-32b-q4=5Fk=5Fm.gguf=20\=20--tensor-parallel-size=202=20?=
 =?UTF-8?q?\=20--port=208132=20\=20--max-model-len=201024=20\=20--gpu-memo?=
 =?UTF-8?q?ry-utilization=200.7=20\=20>=20/data/models/qwq32-q4.log=202>&1?=
 =?UTF-8?q?=20```,=20=E5=87=BA=E7=8E=B0=E4=BA=86=E6=8A=A5=E9=94=99?=
 =?UTF-8?q?=EF=BC=9A=20The=20tokenizer=20class=20you=20load=20from=20this?=
 =?UTF-8?q?=20checkpoint=20is=20'LlamaTokenizer'.=20The=20class=20this=20f?=
 =?UTF-8?q?unction=20is=20called=20from=20is=20'Qwen2TokenizerFast'.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

模型检查点（qwq-32b-q4_k_m.gguf）内部分词器配置为LlamaTokenizer，但代码中实际使用的是Qwen2TokenizerFast（通义千问的分词器），为什么是这样？