Chengji Yao
212007b168
[Hardware][TPU] Fix the recompiling issue in logits processor after warmup ( #14510 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-03-09 05:44:39 -04:00
Isotr0py
fb16eea48b
[Bugfix] Revert QKVCrossParallelLinear usage in Mllama to keep BNB quantization work ( #14498 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-09 04:47:45 +00:00
Yuchen Yan
73ae0b44e9
[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 ( #12428 )
...
Signed-off-by: Yuchen Yan <740987012@qq.com>
2025-03-08 20:14:53 -08:00
Jiayi Yao
6d7f037748
[Feat] Support chunked prefill for LMCache connector ( #14505 )
...
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
2025-03-08 19:30:06 -08:00
iefgnoix
10f7552789
[V1][TPU] Remove unnecessary padding for running on TPU. ( #14467 )
2025-03-08 21:56:04 -05:00
Lucas Wilkinson
b0d541947a
[Attention] Default to FlashMLA backend for MLA ( #14451 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-08 18:18:39 -08:00
Robert Shaw
5f0b53c6ea
Revert "[V1][Core] Fix memory issue with logits & sampling" ( #14504 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-08 17:43:37 -08:00
22quinn
eb8b5eb183
[V1] Support bad_words in sampler ( #13376 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-08 14:50:26 -08:00
Cyrus Leung
9513290032
[Misc] Upgrade to Python 3.9 typing for additional directories ( #14492 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-08 17:35:50 +00:00
Russell Bryant
0d5e73d30e
Update CODEOWNERS for structured output ( #14496 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-08 17:19:51 +00:00
Isotr0py
609ef61fea
[Bugfix] Fix profiling OOM and decouple encoder multimodal profiling ( #14361 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-08 16:52:34 +00:00
Lucas Wilkinson
db84f5eb3b
[Bugfix] DeepSeek Accuracy ( #14476 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-03-08 16:47:03 +00:00
Harry Mellor
206e2577fa
Move requirements into their own directory ( #12547 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 16:44:35 +00:00
Cyrus Leung
e02883c400
[Misc] Don't run ruff at all on 3rd party libs ( #14493 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-08 07:16:40 -08:00
Russell Bryant
9085aabd62
[benchmarks] Add option to use unique jsonschema for each request ( #14457 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-08 06:36:39 -08:00
Roger Wang
8d5aa466fb
[V1][Core] Fix memory issue with logits & sampling ( #13776 )
...
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-03-08 06:11:04 -08:00
Aaron Pham
0b7f06b447
[Misc] add use_tqdm_on_load to reduce logs ( #14407 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-08 05:57:46 -08:00
Isotr0py
03fe18ae0f
[VLM] Add TP support for Phi-4-MM ( #14453 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-08 05:57:14 -08:00
Alexander Matveev
cb8bdfade2
[V1] TPU - Add tensor parallel support via Ray ( #13618 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-08 08:19:38 -05:00
Cyrus Leung
33f227e16b
[CI/Build] Use a fixed seed to avoid flaky tests ( #14480 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-08 11:30:09 +00:00
Harry Mellor
cfd0ae8234
Add RLHF document ( #14482 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 09:51:39 +00:00
Lucas Wilkinson
7caff01a7b
[Build/BugFix] Fix hopper 12.8 build ( #14354 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-08 08:11:56 +00:00
Harry Mellor
be0b399d74
Add training doc signposting to TRL ( #14439 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 07:35:07 +00:00
Jee Jee Li
b8b0ccbd2d
[Bugfix] Make the deviceprofiler include LoRA memory. ( #14469 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-08 07:12:22 +00:00
Robin
c908a07f57
[Doc] Added QwQ-32B to the supported models list in the reasoning out… ( #14479 )
...
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-08 07:07:32 +00:00
Robin
7b6fd6e486
[Doc]add doc for Qwen models tool calling ( #14478 )
...
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-08 06:58:46 +00:00
Harry Mellor
47512b3200
Default to generation_config from model ( #12622 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 14:46:15 +08:00
Roger Meier
3b9c6c6947
[CI/Build] refactor: set timezone of container to UTC ( #12888 )
...
Signed-off-by: Roger Meier <r.meier@siemens.com>
2025-03-07 22:42:01 -08:00
Aviv Keshet
4aae667668
[core] add extra_args to SamplingParams ( #13300 )
...
Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com>
2025-03-08 14:41:18 +08:00
Cody Yu
9f3bc0f58c
[MISC][V1] Register process killing handler only in the main thread ( #14380 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-03-07 22:40:06 -08:00
Mathis Felardos
980385f8c1
[Bugfix][Disaggregated] Add a check in send_kv_caches_and_hidden_states and fix the reshape of the KVCache ( #14369 )
...
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
2025-03-07 22:39:31 -08:00
Tyler Michael Smith
ca7a2d5f28
Revert "[Perf] Reduce MLA CPU overheads in V1 ( #14384 )" ( #14471 )
2025-03-07 22:18:53 -08:00
Tyler Michael Smith
333681408f
[Bugfix][V1] Handle MLA in kv_cache_interface ( #14462 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-07 22:18:25 -08:00
afeldman-nm
ef64044079
[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC ( #13949 )
2025-03-08 01:48:12 +00:00
yarongmu-google
66e16a038e
[Bugfix] Fix torch_xla which can't handle None seed introduced in #14274 ( #14459 )
...
Signed-off-by: Yarong Mu <ymu@google.com>
2025-03-07 23:17:04 +00:00
Mark McLoughlin
e1f0835ae0
[V1][Metrics] Fix traceback with preemptions+LoRA ( #14220 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-07 15:36:16 -05:00
Nick Hill
8ed5421aaa
[V1] Eagerly remove finished requests from the batch ( #14388 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-03-07 10:56:00 -08:00
youkaichao
c6359e8ca6
[v1] torch.compile integration explanation ( #14437 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-08 01:55:50 +08:00
Jee Jee Li
952a074980
[Misc] Add Phi4-MM example ( #14343 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-07 17:28:52 +00:00
Jinzhen Lin
d0feea31c7
[Kernel] optimize performance of gptq marlin kernel when n is small ( #14138 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-03-07 11:53:38 -05:00
Jeremy Arnold
58abe35455
[Benchmarks] Make detokenization optional in benchmark scripts ( #11697 )
...
Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>
2025-03-07 08:09:00 -08:00
York-RDWang
f7ebad2307
[Doc] Update prefix_caching.md to match the example image ( #14420 )
2025-03-07 15:29:00 +00:00
Aaron Pham
80e9afb5bc
[V1][Core] Support for Structured Outputs ( #12388 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-03-07 07:19:11 -08:00
iefgnoix
1e3598edeb
Use the optimized block sizes after tuning the kernel. ( #14329 )
2025-03-07 13:25:13 +00:00
Harry Mellor
f7a6bd0fa1
Fix missing kv_caches and attn_metadata in OpenVINOCausalLM ( #14271 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-07 12:30:42 +00:00
Aleksandr Malyshev
0ca3b8e01c
[BUGFIX] Skip tokenization support for throughput benchmark ( #12712 )
...
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-03-07 02:51:47 -08:00
மனோஜ்குமார் பழனிச்சாமி
cc10281498
[Misc] Set default value of seed to None ( #14274 )
...
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
2025-03-07 10:40:01 +00:00
Cyrus Leung
05fb6718f0
[Bugfix] Clean up multi-modal processors ( #14417 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-07 10:33:38 +00:00
Jee Jee Li
12c29a881f
[Bugfix] Further clean up LoRA test ( #14422 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-07 10:30:55 +00:00
Peng Li
70da0c0748
correct wrong markdown syntax ( #14414 )
...
Signed-off-by: vincent-pli <justdoit.pli@gmail.com>
2025-03-07 08:01:18 +00:00