Ning Xie
|
2f1c19b245
|
[CI] change spell checker from codespell to typos (#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-11 19:57:10 -07:00 |
|
Richard Zou
|
42f52cc95b
|
[CI/Build] Fix torch nightly CI dependencies (#19505)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-06-11 14:40:42 -07:00 |
|
Robert Shaw
|
97a9465bbc
|
[UX] Add Feedback During CUDAGraph Capture (#19501)
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
|
2025-06-11 21:09:05 +00:00 |
|
rasmith
|
c7ea0b56cd
|
[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (#17331)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-06-11 15:53:28 -04:00 |
|
bnellnm
|
29fa5cac1c
|
[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-11 12:53:10 -04:00 |
|
Woosuk Kwon
|
b2d9be6f7d
|
[Docs] Remove WIP features in V1 guide (#19498)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-11 09:15:03 -07:00 |
|
Jee Jee Li
|
04a55612dd
|
[Misc] Fix misleading ROCm warning (#19486)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-12 00:12:10 +08:00 |
|
David Xia
|
89b0f84e17
|
[doc] fix "Other AI accelerators" getting started page (#19457)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-06-11 16:11:17 +00:00 |
|
Michael Goin
|
497a91e9f7
|
[CI] Update FlashInfer to 0.2.6.post1 (#19297)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-11 22:57:28 +08:00 |
|
runzhen
|
943ffa5703
|
[Bugfix] Update the example code, make it work with the latest lmcache (#19453)
Signed-off-by: Runzhen Wang <wangrunzhen@gmail.com>
|
2025-06-11 12:42:20 +00:00 |
|
Louie Tsai
|
5c8d34a42c
|
Support no privileged mode on CPU for docker and kubernetes deployments (#19241)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-06-11 04:11:47 -07:00 |
|
Ximingwang-09
|
3c8694eabe
|
Fix some typo (#19475)
Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com>
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-06-11 10:36:04 +00:00 |
|
Michael Goin
|
7484e1fce2
|
Add cache to cuda get_device_capability (#19436)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-11 17:37:05 +08:00 |
|
Cyrus Leung
|
a2142f0196
|
Support non-string values in JSON keys from CLI (#19471)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-11 09:34:04 +00:00 |
|
Lu Fang
|
871d6b7c74
|
[Misc] Reduce warning message introduced in env_override (#19476)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-11 17:29:54 +08:00 |
|
Cyrus Leung
|
29a38f0352
|
[Doc] Support "important" and "announcement" admonitions (#19479)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-11 01:39:58 -07:00 |
|
Cyrus Leung
|
a5115f4ff5
|
[Doc] Fix quantization link titles (#19478)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-11 01:27:22 -07:00 |
|
Cyrus Leung
|
68b4a26149
|
[Doc] Update V1 User Guide for Hardware and Models (#19474)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-06-11 00:49:06 -07:00 |
|
artetaout
|
b8e809a057
|
[Kernel] Support deep_gemm for linear methods (#19085)
Signed-off-by: artetaout <lulala341@gmail.com>
|
2025-06-11 15:14:45 +08:00 |
|
Lu Fang
|
5039ec2336
|
[ROCm] Add rules to automatically label ROCm related PRs (#19405)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-11 15:09:18 +08:00 |
|
leopardracer
|
7c644ab6d5
|
Fix Typo in Documentation and Function Name (#19442)
|
2025-06-10 22:44:11 -07:00 |
|
Junhao Li
|
2d40665fe8
|
Add fused MOE config for Qwen3 30B A3B on B200 (#19455)
Signed-off-by: Junhao Li <junhao@ubicloud.com>
|
2025-06-11 13:43:46 +08:00 |
|
Lukas Geiger
|
96ada386b7
|
[Misc] Remove unused MultiModalHasher.hash_prompt_mm_data (#19422)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-11 05:18:57 +00:00 |
|
Michael Goin
|
1e473b3010
|
[CI] Disable failing GGUF model test (#19454)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-11 05:12:38 +00:00 |
|
Lu Fang
|
2b1e2111b0
|
Fix test_max_model_len in tests/entrypoints/llm/test_generate.py (#19451)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-11 12:54:59 +08:00 |
|
niu_he
|
a45b979d9f
|
[BugFix] Fix docker build cpu-dev image error (#19394)
Signed-off-by: niu_he <carlton2tang@gmail.com>
|
2025-06-10 20:56:40 -07:00 |
|
wang.yuqi
|
3952731e8f
|
[New Model]: Support Qwen3 Embedding & Reranker (#19260)
|
2025-06-10 20:07:30 -07:00 |
|
Richard Zou
|
77f0d465d0
|
[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 (#19390)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-06-11 07:54:41 +08:00 |
|
Xu Wenqing
|
22c3c0aa4a
|
Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 (#19401)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
|
2025-06-11 07:23:57 +08:00 |
|
py-andy-c
|
33f8dba7c6
|
[Model] use AutoWeightsLoader for commandr (#19399)
Signed-off-by: py-andy-c <pychen1017@gmail.com>
|
2025-06-10 22:42:21 +00:00 |
|
Gregory Shtrasberg
|
5241ca50d6
|
[ROCm][V1] Adding ROCm to the list of plaforms using V1 by default (#19440)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-06-10 22:06:15 +00:00 |
|
Russell Bryant
|
da9b523ce1
|
[Docs] Note that alternative structured output backends are supported (#19426)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-10 16:20:00 +00:00 |
|
Jee Jee Li
|
b6553be1bc
|
[Misc] Slight improvement of the BNB (#19418)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
v0.9.1rc2
v0.9.1
|
2025-06-10 13:51:49 +00:00 |
|
youkaichao
|
64a9af5afa
|
Simplify ep kernels installation (#19412)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-06-10 20:06:08 +08:00 |
|
Li, Jiang
|
e4248849ec
|
[BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral (#19411)
Signed-off-by: jiang.li <jiang1.li@intel.com>
|
2025-06-10 12:02:40 +00:00 |
|
Rachel Guo
|
467bef18a3
|
[BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected keyword use_irope (#19134)
Signed-off-by: Yunqiu Guo <guorachel@meta.com>
|
2025-06-10 16:48:51 +08:00 |
|
Isotr0py
|
5f1ac1e1d1
|
Revert "[v1] Add fp32 support to v1 engine through flex attn" (#19404)
|
2025-06-10 01:30:20 -07:00 |
|
Louie Tsai
|
9368cc90b2
|
Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. (#17930)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
|
2025-06-10 06:22:05 +00:00 |
|
Anna Pendleton
|
32b3946bb4
|
Add clear documentation around the impact of debugging flag (#19369)
Signed-off-by: Anna Pendleton <pendleton@google.com>
|
2025-06-10 06:16:09 +00:00 |
|
Reid
|
6b1391ca7e
|
[Misc] refactor neuron_multimodal and profiling (#19397)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-10 06:12:42 +00:00 |
|
Russell Bryant
|
a3f66e75d1
|
Add security warning to bug report template (#19365)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2025-06-10 06:06:36 +00:00 |
|
Lukas Geiger
|
319cb1e351
|
[Core] Batch multi modal input using pinned memory (#19169)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-06-10 13:44:59 +08:00 |
|
Li Wang
|
1efef71645
|
[Bugfix] Fix modelscope token passed in (#19389)
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-10 13:39:37 +08:00 |
|
Nick Hill
|
646d62f636
|
[Core] Use tuple for kv cache group block ids (#19175)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-10 07:01:17 +02:00 |
|
Reid
|
6cd4ae8acd
|
[Frontend] Add tqdm_leave_pbar to control progress bar visibility (#19357)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-10 04:55:09 +00:00 |
|
Harry Mellor
|
c016047ed7
|
Fix docs/mkdocs/hooks/remove_announcement.py (#19382)
|
2025-06-09 21:36:54 -07:00 |
|
XiongfeiWei
|
9af6d22e4c
|
Use xla flag to improve the quantized model performance (#19303)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-06-10 01:28:45 +00:00 |
|
Tianyu Guo
|
4589b94032
|
[Bugfix] Fix benchmark_moe.py (#19016)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
|
2025-06-09 18:04:36 -07:00 |
|
Ye (Charlotte) Qi
|
cc867be19c
|
[V1] Reuse V0's memory_profiling util for gpu worker memory profiling (#19312)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-06-10 08:40:01 +08:00 |
|
Siyuan Liu
|
3a7cd627a8
|
[Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration (#19383)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
v0.9.1rc1
|
2025-06-09 16:41:51 -07:00 |
|