6796 Commits

Author SHA1 Message Date
Chauncey
13affc432d
[Misc] Remove redundant code (#16098)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-05 20:03:50 -07:00
Reid
d8f094a92a
[Misc] format output for encoder_decoder.py (#16095)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-05 19:57:18 -07:00
Harry Mellor
97ae6d777f
Fix some capitalisations in generated examples doc titles (#16094)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-05 13:44:03 +00:00
yihong
6baeee70d1
Revert "doc: add info for macos clang errors (#16049)" (#16091)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-05 11:51:51 +00:00
Reid
d2517a4939
[doc] fix 404 (#16082)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-05 11:39:18 +00:00
yihong
6342adc438
fix: support clang17 for macos and fix the real libomp (#16086)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-05 11:00:12 +00:00
Kevin H. Luu
0adba91547
[CI] Fix benchmark script level (#16089) 2025-04-05 03:36:01 -07:00
Tristan Leclercq
4285e423a6
[Misc] Auto detect bitsandbytes pre-quantized models (#16027)
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>
2025-04-04 23:30:45 -07:00
Woosuk Kwon
63375f0cdb
[V1][Spec Decode] Update N-gram Proposer Interface (#15750)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
v0.8.3rc1
2025-04-04 16:32:54 -07:00
Michael Goin
70ad3f9e98
[Bugfix][TPU] Fix V1 TPU worker for sliding window (#16059)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-04-04 23:31:19 +00:00
bnellnm
d6fc629f4d
[Kernel][Minor] Re-fuse triton moe weight application (#16071)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-04-04 23:27:34 +00:00
Roger Wang
af51d80fa1
Revert "[V1] Scatter and gather placeholders in the model runner" (#16075) 2025-04-04 14:50:57 -07:00
Cyrus Leung
f5722a5052
[V1] Scatter and gather placeholders in the model runner (#15712)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-04-04 21:26:44 +00:00
Nick Hill
651cf0fec1
[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue (#15906)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-04 12:56:43 -07:00
Kevin H. Luu
4dc52e1c53
[CI] Reorganize .buildkite directory (#16001)
Signed-off-by: kevin <kevin@anyscale.com>
2025-04-04 12:16:20 -07:00
Michael Goin
4708f13a9c
[Bugfix] Fix default behavior/fallback for pp in v1 (#16057)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-04 17:58:08 +00:00
Gregory Shtrasberg
a6d042df0a
[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917, but for ROCm only (#15413)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-04-04 09:40:37 -07:00
Gregory Shtrasberg
40a36ccfeb
[ROCm][Bugfix] Use platform specific FP8 dtype (#15717)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-04-04 09:40:20 -07:00
Ilya Markov
ef608c37a7
[Distributed] [ROCM] Fix custom allreduce enable checks (#16010)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-04-04 09:39:08 -07:00
Li, Jiang
2386803f2a
[CPU] Change default block_size for CPU backend (#16002)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-04-04 09:39:05 -07:00
Ziji Shi (Steven)
95862f7b4d
[Benchmark][Doc] Update throughput benchmark and README (#15998)
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-04-04 09:39:02 -07:00
Isotr0py
230b131b54
[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-04-04 09:38:58 -07:00
liuzhenwei
0812d8dd41
[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe (#15945)
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
2025-04-04 09:38:55 -07:00
Jonghyun Choe
bf7e3c51ae
[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt (#15939)
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>
2025-04-04 09:38:52 -07:00
Mark McLoughlin
a35a8a8392
[V1][Spec Decode] Avoid logging useless nan metrics (#16023)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-04-04 08:52:41 -07:00
yihong
4ef0bb1fcf
doc: add info for macos clang errors (#16049)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-04 14:58:16 +00:00
Chengji Yao
fadc59c0e6
[TPU][V1] Remove ragged attention kernel parameter hard coding (#16041)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-04-04 07:48:50 -04:00
Reid
86cbd2eee9
[Misc] improve gguf check (#15974)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-04 01:33:36 +00:00
Huy Do
092475f738
[ROCm] Tweak the benchmark script to run on ROCm (#14252) 2025-04-03 17:12:48 -07:00
bnellnm
dcc56d62da
[Bugfix] Fix function names in test_block_fp8.py (#16033)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-04-03 23:01:34 +00:00
Robert Shaw
f15e70d906
[TPU] Switch Test to Non-Sliding Window (#15981)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-04-03 14:28:45 -07:00
iefgnoix
b6be6f8d1e
[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-04-03 14:23:28 -07:00
Alexei-V-Ivanov-AMD
03a70eacaf
Re-enable the AMD Testing for the passing tests. (#15586)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-04-03 11:05:17 -07:00
yarongmu-google
45b1ff7a25
[Misc][Performance] Advance tpu.txt to the most recent nightly torch … (#16024) 2025-04-03 17:32:54 +00:00
bnellnm
15ba07ef25
[Minor] Fused experts refactor (#15914)
Signed-off-by: Bill Nell <bnell@redhat.com>
2025-04-03 10:19:38 -07:00
Liangfu Chen
d2b58ca203
[Neuron][kernel] Fuse kv cache into a single tensor (#15911)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
2025-04-03 09:51:32 -07:00
Kyle Sayers
82e7e19a6e
[SupportsQuant] Chameleon, Chatglm, Commandr (#15952)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-04-03 08:25:22 -07:00
Kyle Sayers
421c462948
[SupportsQuant] Bert, Blip, Blip2, Bloom (#15573)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-04-03 08:23:19 -07:00
yihong
84884cd9ac
fix: tiny fix make format.sh excutable (#16015)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-03 15:18:05 +00:00
Reid
a43aa183dc
[doc] update contribution link (#15922)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-03 10:47:31 +00:00
wwl2755
463bbb1835
[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (#15367)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-04-03 07:32:10 +00:00
youkaichao
5e125e74d1
[misc] improve error message for "Failed to infer device type" (#15994)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-04-03 14:45:03 +08:00
Ziji Shi (Steven)
06f21ce7a5
[Benchmark] Add AIMO Dataset to Benchmark (#15955)
Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com>
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
2025-04-03 06:09:18 +00:00
Aleksandr Malyshev
57a810db9c
[ROCM][V0] PA kennel selection when no sliding window provided (#15982)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-04-03 05:28:44 +00:00
youkaichao
8b664706aa
[bugfix] add seed in torchrun_example.py (#15980)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-04-03 12:25:01 +08:00
yihong
37bfee92bf
fix: better error message for get_config close #13889 (#15943)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-03 03:53:19 +00:00
Aleksandr Malyshev
e73ff24e31
[ROCM][KERNEL] Paged attention for V1 (#15720)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
2025-04-02 19:48:00 -07:00
Nicolò Lucchesi
bd7599d34a
[V1][TPU] Do not compile sampling more than needed (#15883)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-03 01:36:01 +00:00
Chengji Yao
01b6113659
[TPU] optimize the all-reduce performance (#15903)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-04-03 00:25:14 +00:00
Hyesoo Yang
1b84eff03a
[V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>
2025-04-02 17:18:08 -07:00