Conroy Cheers
|
0860087aff
|
[Fix] Fall back to Gloo when NCCL backend is unavailable (#19641)
Signed-off-by: conroy-cheers <conroy@corncheese.org>
|
2025-06-17 08:42:14 +08:00 |
|
Dipika Sikka
|
6bc7b57315
|
[Quantization] Remove FP4 emulation; Fall-back to marlin for device < 100 (#19563)
|
2025-06-16 17:33:51 -04:00 |
|
Russell Bryant
|
90f9c2eb5c
|
[V1] Change return type on get_multimodal_embeddings() (#19446)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-16 13:32:15 -04:00 |
|
qscqesze
|
387bdf0ab9
|
[Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) (#19677)
Signed-off-by: QscQ <qscqesze@gmail.com>
|
2025-06-16 09:47:14 -07:00 |
|
bnellnm
|
5e5baa91aa
|
[Kernels] Use empty for modular MoE workspaces (#19667)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-06-16 14:58:01 +00:00 |
|
Chauncey
|
836d4ce140
|
[Bugfix] fix missing 'finish_reason': null in streaming chat (#19662)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-06-16 14:10:39 +00:00 |
|
Isotr0py
|
1173804dca
|
[Bugfix] Fix TP inference for Flex attention backend (#19657)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-16 11:21:37 +00:00 |
|
Shawn Tan
|
4d5424029b
|
[Feature]:Allow for Granite MoE Hybrid models with _only_ shared experts. (#19652)
Signed-off-by: Shawn Tan <shawntan@ibm.com>
|
2025-06-16 11:14:18 +00:00 |
|
Nick Hill
|
ee35e96ac3
|
[BugFix] Don't catch BaseException when dumping execute_model errors (#19626)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-16 11:01:08 +00:00 |
|
Szymon Ożóg
|
dec66d253b
|
[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-06-16 17:33:26 +08:00 |
|
wang.yuqi
|
f40f763f12
|
[CI] Add mteb testing for rerank models (#19344)
|
2025-06-16 01:36:43 -07:00 |
|
Ning Xie
|
26bc46ef89
|
[MISC] typo fix (#19672)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-16 07:18:49 +00:00 |
|
Chengji Yao
|
a77aea59fd
|
[TPU] support attention head dim smaller than 128 (#19620)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-06-16 06:40:53 +00:00 |
|
Ye (Charlotte) Qi
|
b692e9cd07
|
[Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config (#19660)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-06-16 06:30:29 +00:00 |
|
Francesco Bertolotti
|
367871a469
|
[Misc][Frontend] passthrough bad_words (#19564)
Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>
Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com>
|
2025-06-16 05:05:13 +00:00 |
|
quanliu
|
92183b41f3
|
[Bugfix][Core] Prefix caching causes incorrect outputs due to outdated ComputedBlocksTracker (#18957)
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>
|
2025-06-15 21:56:37 -07:00 |
|
Isotr0py
|
a5e7242d5f
|
[Misc] Remove duplicate multiproc method setting for CPU platform (#19649)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-06-16 02:26:58 +00:00 |
|
Woosuk Kwon
|
055915e6ce
|
Enable prefix caching with full cuda graphs (#19617)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-15 01:05:05 -07:00 |
|
22quinn
|
0b73736a0d
|
[Kernel] Raise verbose error and consolidate num_heads/num_kv_heads divisibility check (#19339)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-15 13:43:48 +08:00 |
|
Lu Fang
|
ee1531bc38
|
[Bugfix][2/n] Fix speculative decoding CI - Fix test_ngram_e2e_greedy_correctness (#19644)
|
2025-06-14 21:15:41 -07:00 |
|
maobaolong
|
08500011d3
|
[Fix] Convert kv_transfer_config from dict to KVTransferConfig (#19262)
|
2025-06-14 12:32:07 -07:00 |
|
Konrad Zawora
|
861a0a0a39
|
[Bugfix] Don't attempt to use triton if no driver is active (#19561)
|
2025-06-14 12:30:54 -07:00 |
|
Isotr0py
|
2db9044ab6
|
[Bugfix] Fix auto dtype casting for BatchFeature (#19316)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-06-14 15:13:08 +00:00 |
|
Saheli Bhattacharjee
|
d1e34cc9ac
|
[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics. (#18354)
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai>
|
2025-06-14 11:07:36 +08:00 |
|
Nick Hill
|
bd517eb9fe
|
[BugFix] Fix DP Coordinator incorrect debug log message (#19624)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-14 00:18:03 +00:00 |
|
Woosuk Kwon
|
aafbbd981f
|
[torch.compile] Use custom ops when use_inductor=False (#19618)
|
2025-06-13 15:05:54 -07:00 |
|
Luka Govedič
|
3597b06a4f
|
[CUDA] Enable full cudagraph for FlashMLA (#18581)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-13 18:12:26 +00:00 |
|
qscqesze
|
a24cb91600
|
[Model] Fix minimax model cache & lm_head precision (#19592)
Signed-off-by: qingjun <qingjun@minimaxi.com>
|
2025-06-13 12:08:20 +00:00 |
|
Nick Hill
|
7e8d97dd3f
|
[BugFix] Honor enable_caching in connector-delayed kvcache load case (#19435)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-13 09:46:32 +00:00 |
|
youkaichao
|
d70bc7c029
|
[torch.compile] reorganize the cache directory to support compiling multiple models (#19064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-06-13 15:23:25 +08:00 |
|
Boyuan Feng
|
ce688ad46e
|
use base version for version comparison (#19587)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-06-13 15:09:34 +08:00 |
|
汪志鹏
|
cefdb9962d
|
[Fix] The zip function in Python 3.9 does not have the strict argument (#19549)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-06-13 14:57:48 +08:00 |
|
Li, Jiang
|
6458721108
|
[CPU] Refine default config for the CPU backend (#19539)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-06-13 13:27:39 +08:00 |
|
Hyogeun Oh (오효근)
|
bb4a0decef
|
[Misc] Correct broken docs link (#19553)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
|
2025-06-12 22:27:13 -07:00 |
|
qizixi
|
c68698b326
|
[Bugfix] Fix EAGLE vocab embedding for multimodal target model (#19570)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-06-12 23:09:19 -04:00 |
|
Varun Sundar Rabindranath
|
e3b12667d4
|
[BugFix] : Fix Batched DeepGemm Experts (#19515)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-06-12 20:43:02 -06:00 |
|
Russell Bryant
|
c57bb199b3
|
[V1] Resolve failed concurrent structured output requests (#19565)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-06-12 23:30:09 +00:00 |
|
Michael Goin
|
a3319f4f04
|
[Bugfix] Enforce contiguous input for dynamic_per_token FP8/INT8 quant (#19452)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-12 15:39:15 -04:00 |
|
Varun Sundar Rabindranath
|
9d880f594d
|
[Misc] Turn MOE_DP_CHUNK_SIZE into an env var (#19506)
|
2025-06-12 18:01:16 +00:00 |
|
Ekagra Ranjan
|
017ef648e9
|
[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets (#18847)
|
2025-06-12 10:30:56 -07:00 |
|
Luka Govedič
|
f98548b9da
|
[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2025-06-12 08:31:04 -07:00 |
|
mobicham
|
96846bb360
|
Fix TorchAOConfig skip layers (#19265)
Signed-off-by: mobicham <hicham@mobiuslabs.com>
|
2025-06-12 22:22:53 +08:00 |
|
Nicolò Lucchesi
|
1129e2b1ab
|
[V1][NixlConnector] Drop num_blocks check (#19532)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-06-12 12:36:14 +00:00 |
|
Jee Jee Li
|
73e2e0118f
|
[Quantization] Improve AWQ logic (#19431)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-06-12 11:02:11 +00:00 |
|
jmswen
|
c9280e6346
|
[Bugfix] Respect num-gpu-blocks-override in v1 (#19503)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-12 11:00:23 +00:00 |
|
Michael Goin
|
af09b3f0a0
|
[Bugfix][V1] Allow manual FlashAttention for Blackwell (#19492)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-12 10:40:24 +00:00 |
|
rasmith
|
2e090bd5df
|
[AMD][Kernel][BugFix] fix test_rocm_compressed_tensors_w8a8 for rocm (#19509)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-06-12 07:14:24 +00:00 |
|
wonjun Jang
|
1b0b065eb5
|
[BugFix] Handle missing sep_token for Qwen3-Reranker in Score API (#19522)
Signed-off-by: strutive07 <strutive07@gmail.com>
|
2025-06-12 07:00:47 +00:00 |
|
Nick Hill
|
d5bdf899e4
|
[BugFix] Work-around incremental detokenization edge case error (#19449)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-12 06:43:20 +00:00 |
|
22quinn
|
7e3e74c97c
|
[Frontend] Improve error message in tool_choice validation (#19239)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-12 01:13:00 -04:00 |
|