Li, Jiang
|
e4248849ec
|
[BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral (#19411)
Signed-off-by: jiang.li <jiang1.li@intel.com>
|
2025-06-10 12:02:40 +00:00 |
|
Rachel Guo
|
467bef18a3
|
[BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected keyword use_irope (#19134)
Signed-off-by: Yunqiu Guo <guorachel@meta.com>
|
2025-06-10 16:48:51 +08:00 |
|
Isotr0py
|
5f1ac1e1d1
|
Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404)
|
2025-06-10 01:30:20 -07:00 |
|
Louie Tsai
|
9368cc90b2
|
Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. (#17930)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
|
2025-06-10 06:22:05 +00:00 |
|
Anna Pendleton
|
32b3946bb4
|
Add clear documentation around the impact of debugging flag (#19369)
Signed-off-by: Anna Pendleton <pendleton@google.com>
|
2025-06-10 06:16:09 +00:00 |
|
Reid
|
6b1391ca7e
|
[Misc] refactor neuron_multimodal and profiling (#19397)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-10 06:12:42 +00:00 |
|
Russell Bryant
|
a3f66e75d1
|
Add security warning to bug report template (#19365)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2025-06-10 06:06:36 +00:00 |
|
Lukas Geiger
|
319cb1e351
|
[Core] Batch multi modal input using pinned memory (#19169)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-10 13:44:59 +08:00 |
|
Li Wang
|
1efef71645
|
[Bugfix] Fix modelscope token passed in (#19389)
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-10 13:39:37 +08:00 |
|
Nick Hill
|
646d62f636
|
[Core] Use tuple for kv cache group block ids (#19175)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-10 07:01:17 +02:00 |
|
Reid
|
6cd4ae8acd
|
[Frontend] Add tqdm_leave_pbar to control progress bar visibility (#19357)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-10 04:55:09 +00:00 |
|
Harry Mellor
|
c016047ed7
|
Fix docs/mkdocs/hooks/remove_announcement.py (#19382)
|
2025-06-09 21:36:54 -07:00 |
|
XiongfeiWei
|
9af6d22e4c
|
Use xla flag to improve the quantized model performance (#19303)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-06-10 01:28:45 +00:00 |
|
Tianyu Guo
|
4589b94032
|
[Bugfix] Fix benchmark_moe.py (#19016)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
|
2025-06-09 18:04:36 -07:00 |
|
Ye (Charlotte) Qi
|
cc867be19c
|
[V1] Reuse V0's memory_profiling util for gpu worker memory profiling (#19312)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-06-10 08:40:01 +08:00 |
|
Siyuan Liu
|
3a7cd627a8
|
[Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration (#19383)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
v0.9.1rc1
|
2025-06-09 16:41:51 -07:00 |
|
Pavani Majety
|
8058c91108
|
[HOT-FIX] Add kv_sharing_target_layer_name argument to cutlass_mla backend (#19374)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-06-09 19:00:07 -04:00 |
|
Siyuan Liu
|
7d44c469fe
|
[TPU]Fix KV cache sharing tests (#19371)
|
2025-06-09 18:38:15 -04:00 |
|
liusiqian-tal
|
31f58be96a
|
[Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var (#18472)
Signed-off-by: liusiqian <liusiqian@tal.com>
|
2025-06-09 21:41:21 +00:00 |
|
Kyle Sayers
|
ebb2f383b8
|
[Quantization] Bump compressed-tensors version (#19295)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-06-09 14:33:15 -07:00 |
|
22quinn
|
c1c7dbbeeb
|
[Bugfix][Core] Prevent token lengths exceeding max_model_len in V0 (#19348)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-09 23:01:29 +08:00 |
|
Varun Sundar Rabindranath
|
5cf2daea9a
|
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298)
Signed-off-by: Varun <vsundarr@redhat.com>
Co-authored-by: Varun <vsundarr@redhat.com>
|
2025-06-09 10:50:39 -04:00 |
|
Isotr0py
|
b8089195b4
|
[v1] Add fp32 support to v1 engine through flex attn (#19319)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-06-09 22:10:44 +08:00 |
|
Yinghai Lu
|
770e5dcdb8
|
[full_graph] Fix query_start_loc padding (#19321)
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>
|
2025-06-09 21:32:56 +08:00 |
|
Michael Yao
|
c57c9415b1
|
[Docs] Fix a bullet list in usage/security.md (#19358)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-06-09 13:28:51 +00:00 |
|
Lu Fang
|
01810f9236
|
[CI] Introduce rules for llama auto-label (#19323)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-09 20:05:42 +08:00 |
|
Conroy Cheers
|
59abbd84f9
|
[Fix] Allow kernel compilation for CUDA capability 8.7 (#19328)
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
|
2025-06-09 02:57:23 -07:00 |
|
Jee Jee Li
|
95a6568b5c
|
[CI/Build] Fix LoRA test (#19350)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-09 09:52:10 +00:00 |
|
Se7en
|
0eca5eacd0
|
[Doc] Fix description in the Automatic Prefix Caching design doc (#19333)
Signed-off-by: cr7258 <chengzw258@163.com>
|
2025-06-09 17:30:02 +08:00 |
|
Reid
|
12e5829221
|
[doc] improve ci doc (#19307)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-09 07:26:12 +00:00 |
|
Richard Zou
|
3a4d417707
|
[Misc] Cleanup compilation tests (#19343)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-09 15:05:44 +08:00 |
|
Kseniya Parkhamchuk
|
8335667c22
|
[Frontend] Remove unreachable code from llm.py (#19288)
Signed-off-by: KsuParkhamchuk <k.parkhamchuk@gmail.com>
|
2025-06-09 10:22:10 +08:00 |
|
Isotr0py
|
e1c4380d4c
|
[Misc] Add documentation update reminder to PR template (#19289)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-09 10:20:53 +08:00 |
|
Cyrus Leung
|
e31ae3de36
|
[Deprecation] Remove inputs arg fallback in Engine classes (#18799)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-09 10:19:56 +08:00 |
|
wang.yuqi
|
2ffb9b6e07
|
[Bugfix] model_max_length should consider max_model_len in tokenizer_config (#19201)
|
2025-06-08 07:17:53 -07:00 |
|
jennyyyyzhen
|
cda10fa3e2
|
[Multi Modal] Add an env var for message queue max chunk bytes (#19242)
Signed-off-by: yZhen <yZhen@fb.com>
Co-authored-by: yZhen <yZhen@fb.com>
|
2025-06-08 21:39:12 +08:00 |
|
Dipika Sikka
|
c123bc33f9
|
[Quantization] Add compressed-tensors NVFP4 support (#18312)
|
2025-06-08 09:05:55 -04:00 |
|
Akash kaothalkar
|
b9a1791e2c
|
[Hardware][POWER] Add IBM POWER11 Support to CPU Extension Detection (#19082)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
|
2025-06-08 09:17:14 +00:00 |
|
Xu Wenqing
|
989dcee981
|
Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B (#19315)
Signed-off-by: Xu Wenqing <xuwq1993@qq.com>
|
2025-06-08 16:07:02 +08:00 |
|
Richard Zou
|
3d64d366e0
|
[Misc] Change tests/compile to use VLLM_V1 by default (#19302)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-08 16:06:48 +08:00 |
|
Richard Zou
|
eaa2e51088
|
[Bugfix] Re-enable use_cudagraph in vLLM v1 (#19299)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-06-08 08:56:12 +08:00 |
|
Chauncey
|
d77f7fb871
|
[Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an integer (#19283)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-06-08 08:16:31 +08:00 |
|
Luka Govedič
|
2d8476e465
|
[BugFix][V1] Fix memory profiling bug (#18974)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-07 10:34:51 -07:00 |
|
pramenku
|
88be823d57
|
[AMD] Update compatible packaging version (#19309)
Signed-off-by: pramkuma <Pramendra.Kumar@amd.com>
|
2025-06-07 20:55:09 +08:00 |
|
Lifans
|
4e4f63ad45
|
[Nit][Benchmark]Fix example in benchmark_serving_structured_output.py (#19311)
Signed-off-by: Lifan Shen <lifans@meta.com>
|
2025-06-07 18:25:38 +08:00 |
|
Isotr0py
|
d2f0e7e615
|
[CI/Build] Improve Llama GGUF test robustness (#19287)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-07 17:23:28 +08:00 |
|
Reid
|
122cdca5f6
|
[Misc] refactor context extension (#19246)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-07 05:13:21 +00:00 |
|
Driss Guessous
|
cf02f9b283
|
Add FlexAttention to V1 (#16078)
Signed-off-by: drisspg <drisspguessous@gmail.com>
|
2025-06-06 21:58:55 -07:00 |
|
Aaruni Aggarwal
|
c4296b1a27
|
[CI][PowerPC] Use a more appropriate way to select testcase in tests/models/language/pooling/test_embedding.py (#19253)
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
|
2025-06-07 11:52:52 +08:00 |
|
QiliangCui
|
66c508b137
|
[TPU][Test] Add script to run benchmark on TPU for buildkite (#19039)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-06-06 20:10:24 -07:00 |
|