Ming Yang
|
772ce5af97
|
[Misc] Add dummy maverick test to CI (#21324)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-23 20:22:42 -07:00 |
|
Michael Goin
|
82ec66f514
|
[V0 Deprecation] Remove Prompt Adapters (#20588)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-23 16:36:48 -07:00 |
|
22quinn
|
5c9b807b34
|
[Core] Add reload_weights RPC method (#20096)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-23 14:24:52 -07:00 |
|
Yong Hoon Shin
|
4ac7713e32
|
Add test case for compiling multiple graphs (#21044)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-23 11:00:47 -07:00 |
|
Christian Pinto
|
8560a5b258
|
[Core][Model] PrithviMAE Enablement on vLLM v1 engine (#20577)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-07-23 11:00:23 -07:00 |
|
Nick Hill
|
316b1bf706
|
[Tests] Add tests for headless internal DP LB (#21450)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-23 07:49:25 -07:00 |
|
Asher
|
2671334d45
|
[Model] add Hunyuan V1 Dense Model support. (#21368)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-23 03:54:08 -07:00 |
|
Yang Chen
|
6929f8b437
|
[Misc] fixed nvfp4_moe test failures due to invalid kwargs (#21246)
Signed-off-by: Yang Chen <yangche@fb.com>
|
2025-07-23 01:41:43 -07:00 |
|
Yu Chin Fabian Lim
|
32ec9e2f2a
|
Mamba V2 Test not Asserting Failures. (#21379)
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-07-23 01:40:27 -07:00 |
|
Lu Fang
|
accac82928
|
[Sampler] Introduce logprobs mode for logging (#21398)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-07-23 01:39:25 -07:00 |
|
Jialin Ouyang
|
a1f3610fc6
|
[Core] Add basic unit test for maybe_evict_cached_block (#21400)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-07-23 00:02:02 -07:00 |
|
Isotr0py
|
4ecedd1806
|
[Bugfix] Fix nightly transformers CI failure (#21427)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-23 00:01:01 -07:00 |
|
Harry Mellor
|
f154bb9ff0
|
Simplify weight loading in Transformers backend (#21382)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-22 20:29:43 -07:00 |
|
Cyrus Leung
|
c401c64b4c
|
[CI/Build] Fix model executor tests (#21387)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-22 20:25:37 -07:00 |
|
Yiheng Xu
|
4594fc3b28
|
[Model] Add Qwen3CoderToolParser (#21396)
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <xmo@berkeley.edu>
|
2025-07-22 15:05:57 -07:00 |
|
Cyrus Leung
|
35366ae57c
|
[CI/Build] Fix test failure due to updated model repo (#21375)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-22 08:39:35 -07:00 |
|
Aritra Roy Gosthipaty
|
2226d5bd85
|
[Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers (#21353)
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
|
2025-07-22 08:27:28 -07:00 |
|
Raushan Turganbay
|
f38ee34a0a
|
[feat] Enable mm caching for transformers backend (#21358)
Signed-off-by: raushan <raushan@huggingface.co>
|
2025-07-22 08:18:46 -07:00 |
|
Wentao Ye
|
774d0c014b
|
[Perf] Cuda Kernel for Per Token Group Quant (#21083)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-22 07:27:15 -07:00 |
|
Mickaël Seznec
|
4fb56914c5
|
[perf] Add fused MLA QKV + strided layernorm (#21116)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-22 07:07:44 -07:00 |
|
Ning Xie
|
0df4d9b06b
|
[Misc] unify variable for LLM instance v2 (#21356)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-22 06:32:36 -07:00 |
|
Jialin Ouyang
|
ed25054577
|
[Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool (#21222)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-07-22 06:17:47 -07:00 |
|
Kebe
|
bc8a8ce5ec
|
[Misc] Remove deprecated args in v0.10 (#21349)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-07-22 05:26:39 -07:00 |
|
Raghav Ravishankar
|
82b8027be6
|
Add arcee model (#21296)
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-22 00:57:43 -07:00 |
|
Thomas Parnell
|
488d8a986a
|
[V1] [Hybrid] Add new test to verify that hybrid views into KVCacheTensor are compatible (#21300)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-21 23:31:18 -07:00 |
|
Ming Yang
|
e7b2042681
|
Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762) (#21334)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-07-21 21:49:01 -07:00 |
|
Robert Shaw
|
29d1ffc5b4
|
[DP] Fix Prometheus Logging (#21257)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-07-21 09:11:35 -07:00 |
|
Ming Yang
|
6ece16c4fe
|
[Misc] Add dummy maverick test (#21199)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-21 09:08:09 -07:00 |
|
simpx
|
a0e827e07c
|
[BugFix] make utils.current_stream thread-safety (#21252) (#21253)
Signed-off-by: simpx <simpxx@gmail.com>
|
2025-07-21 09:07:36 -07:00 |
|
Woosuk Kwon
|
6dda13c86b
|
[Misc] Add sliding window to flashinfer test (#21282)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-21 08:37:49 -07:00 |
|
Zhiyu
|
6b46c4b653
|
Add Nvidia ModelOpt config adaptation (#19815)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-07-21 10:02:58 -04:00 |
|
Ning Xie
|
d97841078b
|
[Misc] unify variable for LLM instance (#20996)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-21 12:18:33 +01:00 |
|
Cyrus Leung
|
042af0c8d3
|
[Model][1/N] Support multiple poolers at model level (#21227)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-21 02:22:21 -07:00 |
|
Jiayi Yan
|
7ba34b1241
|
[bugfix] fix syntax warning caused by backslash (#21251)
|
2025-07-20 17:12:10 +00:00 |
|
Raushan Turganbay
|
9499e26e2a
|
[Model] Support VLMs with transformers backend (#20543)
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-20 13:25:50 +00:00 |
|
Seiji Eicher
|
d1fb65bde3
|
Enable v1 metrics tests (#20953)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-07-20 03:22:02 +00:00 |
|
Chengji Yao
|
3a1d8940ae
|
[TPU] support fp8 kv cache quantization (#19292)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-20 03:01:00 +00:00 |
|
Yuxuan Zhang
|
10eb24cc91
|
GLM-4 Update (#20736)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Lu Fang <fanglu@fb.com>
|
2025-07-19 22:40:31 +00:00 |
|
Woosuk Kwon
|
752c6ade2e
|
[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-19 13:53:17 -07:00 |
|
Thomas Parnell
|
881e3cbe3b
|
[V1] [Hybrid] Enable piecewise CUDA Graph for mamba layers (#21194)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-19 19:27:21 +00:00 |
|
kourosh hakhamaneshi
|
9f414a12ad
|
[BugFix] Make PD work with Ray (#21072)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-07-19 08:46:50 -07:00 |
|
Rabi Mishra
|
c81259d33a
|
Fix/remove some broken model executor tests (#21224)
Signed-off-by: Rabi Mishra <ramishra@redhat.com>
|
2025-07-19 12:15:07 +00:00 |
|
22quinn
|
b3d82108e7
|
[Bugfix][Frontend] Fix openai CLI arg middleware (#21220)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-19 02:40:38 -07:00 |
|
shixianc
|
7d94577138
|
Add torch golden impl for moe_align_block_size kernel test (#20653)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
|
2025-07-19 02:32:36 -07:00 |
|
Isotr0py
|
18e519ec86
|
[Bugfix] Fix ndarray video color from VideoAsset (#21064)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-19 02:17:16 -07:00 |
|
Jee Jee Li
|
1eaff27815
|
[V0 deprecation] Remove long context LoRA (#21169)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-19 02:15:41 -07:00 |
|
Huy Do
|
cf8cc32674
|
Fix a couple of Voxtral tests (#21218)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-07-19 09:13:41 +00:00 |
|
김종곤
|
3e04107d97
|
[Model] EXAONE 4.0 model support (#21060)
Signed-off-by: Deepfocused <rlawhdrhs27@gmail.com>
Signed-off-by: woongsik <rlawhdrhs27@gmail.com>
|
2025-07-19 14:25:44 +08:00 |
|
Woosuk Kwon
|
dd572c0ab3
|
[V0 Deprecation] Remove V0 Spec Decode workers (#21152)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-18 21:47:50 -07:00 |
|
Lucia Fang
|
9a9fda1423
|
[Core] Support Local Chunked Attention for Hybrid KV Cache (#19351)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <fanglu@meta.com>
|
2025-07-18 20:48:38 -07:00 |
|