7568 Commits

Author SHA1 Message Date
Cyrus Leung
70ca5484f5
[Doc] Update notes (#20668)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-09 03:46:36 -07:00
Thomas Parnell
5358cce5ff
[V1] [Doc] Update V1 docs for Mamba models (#20499)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-09 01:02:41 -07:00
Chauncey
2155e95ef1
[Bugfix] Fix the issue where reasoning_content is None when Thinkng is enabled and tool_choice is set to 'required'. (#20662)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-09 07:39:58 +00:00
qscqesze
f95570a52d
[Docs] fix minimax tool_calling docs error (#20667)
Signed-off-by: qingjun <qingjun@minimaxi.com>
2025-07-09 00:37:07 -07:00
Kunshang Ji
b6e7e3d58f
[Intel GPU] support ray as distributed executor backend for XPU. (#20659)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-07-09 00:36:58 -07:00
Dmitry Rogozhkin
e760fcef22
[XPU] Use spawn with XPU multiprocessing (#20649)
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-07-09 00:34:28 -07:00
B-201
6bbf1795b7
[Misc] Fix the size of batched_dummy_mm_inputs in profile_run (#20434)
Signed-off-by: bk-201 <joy25810@foxmail.com>
2025-07-08 20:15:44 -07:00
Michael Goin
9e0ef888f0
Fix bullets in incremental_build.md (#20642) 2025-07-09 11:03:41 +08:00
Duncan Moss
97abeb1daa
[feat] enable SM100 CUTLASS block scaled group gemm for smaller batch sizes (#20640)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
2025-07-09 11:03:35 +08:00
zhrrr
34dad19e7b
[Bugfix] set default set cuda_graph_sizes to min(self.max_num_seqs * 2, 512) (#20628)
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
2025-07-09 11:02:51 +08:00
Akash kaothalkar
6db31e7a27
[Hardware][PPC64LE] Enable V1 for ppc64le and ARM (#20554)
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Nikhil Gupta <nikhil.gupta2@arm.com>
2025-07-08 20:00:41 -07:00
Ricardo Decal
977180c912
[Docs] Improve documentation for multi-node service helper script (#20600)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-08 19:44:26 -07:00
Ratnam Parikh
c40784c794
[BugFix][Intel GPU] Use refactored API for dist_backend in V1 worker (#20596)
Signed-off-by: ratnampa <ratnam.parikh@intel.com>
2025-07-08 19:44:23 -07:00
kourosh hakhamaneshi
baed180aa0
[tech debt] Revisit lora request model checker (#20636)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-07-09 09:42:41 +08:00
Kunshang Ji
0b407479ef
[misc]refactor Platform.set_device method (#20262)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-07-09 01:39:47 +00:00
Wenxin Cheng
5eaf570050
Replace multiply_add with homogeneous_multiply_add to Address Clang Template Parameter Issue (#20142)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-07-09 00:30:18 +00:00
QiliangCui
d8ee5a2ca4
[TPU][Bugfix] disable phi-3 test (#20632)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-08 23:14:26 +00:00
Isotr0py
b9fca83256
[Bugfix] Fix GLM-4.1-V video prompt update (#20635)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-08 23:13:58 +00:00
Cyrus Leung
32dffc2772
[Core] Rename get_max_tokens_per_item for backward compatibility (#20630)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-08 23:11:30 +00:00
Ming Yang
c438183e99
[Bugfix] Fix topk_ids indices_type for CUTLASS w8a8 FP8 MoE (#20166)
Signed-off-by: Ming Yang <yming@meta.com>
2025-07-08 23:10:57 +00:00
wang.yuqi
baba0389f7
[CI] Increase the threshold of the MTEB RERANK tests (#20615)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-08 08:10:11 -07:00
viravera
c6c22f16d3
Revert invalid spellchecker fix on deepseek_vl2 (#20618) 2025-07-08 15:07:14 +00:00
Cyrus Leung
dd382e0fe3
[Model] Implement missing get_language_model for Keye-VL (#20631)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-08 07:47:46 -07:00
XiongfeiWei
849590a2a7
Update torch/xla pin to 20250703 (#20589)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-08 07:44:02 -07:00
Yan Ma
a4c23314c0
[xpu]feat: support multi-lora on xpu (#20616)
Signed-off-by: yan <yan.ma@intel.com>
2025-07-08 22:07:10 +08:00
Harry Mellor
b942c094e3
Stop using title frontmatter and fix doc that can only be reached by search (#20623)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-08 03:27:40 -07:00
Harry Mellor
b4bab81660
Remove unnecessary explicit title anchors and use relative links instead (#20620)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-08 02:49:13 -07:00
Ricardo Decal
b91cb3fa5c
[Docs] Improve documentation for Deepseek R1 on Ray Serve LLM (#20601)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-08 02:09:06 -07:00
Nicolò Lucchesi
71d1d75b7a
[PD][Nixl] Remote consumer READ timeout for clearing request blocks (#20139)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-08 08:56:40 +01:00
Sanger Steel
72d14d0eed
[Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load (#19619)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Co-authored-by: Eta <esyra@coreweave.com>
2025-07-07 22:47:43 -07:00
Chenyaaang
e34d130c16
[TPU] Temporary fix vmem oom for long model len by reducing page size (#20278)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-07-08 05:16:16 +00:00
Li, Jiang
7721ef1786
[CI/Build][CPU] Fix CPU CI and remove all CPU V0 files (#20560)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-07 22:13:44 -07:00
Reid
8369b7c2a9
[Misc] improve error msg (#20604)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-07 21:45:18 -07:00
Ricardo Decal
3eb4ad53f3
[Docs] Add Anyscale to frameworks (#20590)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-07 20:09:13 -07:00
Ricardo Decal
90a2769f20
[Docs] Add Ray Serve LLM section to openai compatible server guide (#20595)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-07 20:08:05 -07:00
Ricardo Decal
e60d422f19
[Docs] Improve docstring for ray data llm example (#20597)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-07 20:06:26 -07:00
Ricardo Decal
0d914c81a2
[Docs] Rewrite offline inference guide (#20594)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-07 20:06:02 -07:00
Harry Mellor
6e428cdd7a
[Doc] Syntax highlight request responses as JSON instead of bash (#20582)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-07 20:02:45 -07:00
Chauncey
93b9d9f499
[Bugfix]: Fix messy code when using logprobs (#19209)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-08 11:02:15 +08:00
Harry Mellor
af107d5a0e
Make distinct code and console admonitions so readers are less likely to miss them (#20585)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-07 19:55:28 -07:00
Woosuk Kwon
31c5d0a1b7
[Optimize] Don't send token ids when kv connector is not used (#20586)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-07 19:04:54 -07:00
Ming Yang
afb7cff1b9
[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe (#20167)
Signed-off-by: Ming Yang <yming@meta.com>
2025-07-08 01:07:22 +00:00
Kyle Yu
d2e841a10a
[Misc] Improve logging for dynamic shape cache compilation (#20573)
Signed-off-by: kyolebu <kyu@redhat.com>
2025-07-08 00:48:09 +00:00
Patrick von Platen
14601f5fba
[Config] Refactor mistral configs (#20570)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-07-07 15:25:10 -07:00
Harry Mellor
042d131f39
Fix links in multi-modal model contributing page (#18615)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-07 21:13:52 +00:00
rongfu.leng
8e807cdfa4
[Misc] feat output content in stream response (#19608)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-07-07 20:45:10 +00:00
Anton
e601efcb10
[Misc] Add fully interleaved support for multimodal 'string' content format (#14047)
Signed-off-by: drobyshev.anton <drobyshev.anton@wb.ru>
Co-authored-by: drobyshev.anton <drobyshev.anton@wb.ru>
2025-07-07 19:43:08 +00:00
jvlunteren
22dd9c2730
[Kernel] Optimize Prefill Attention in Unified Triton Attention Kernel (#20308)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
2025-07-07 19:08:12 +00:00
Rui Qiao
a6d795d593
[DP] Copy environment variables to Ray DPEngineCoreActors (#20344)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-07 10:14:22 -07:00
ztang2370
a37d75bbec
[Front-end] microbatch tokenization (#19334)
Signed-off-by: zt2370 <ztang2370@gmail.com>
2025-07-07 17:54:10 +01:00