Chenyaaang
|
e34d130c16
|
[TPU] Temporary fix vmem oom for long model len by reducing page size (#20278)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-07-08 05:16:16 +00:00 |
|
Li, Jiang
|
7721ef1786
|
[CI/Build][CPU] Fix CPU CI and remove all CPU V0 files (#20560)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-07 22:13:44 -07:00 |
|
Reid
|
8369b7c2a9
|
[Misc] improve error msg (#20604)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-07 21:45:18 -07:00 |
|
Ricardo Decal
|
3eb4ad53f3
|
[Docs] Add Anyscale to frameworks (#20590)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-07 20:09:13 -07:00 |
|
Ricardo Decal
|
90a2769f20
|
[Docs] Add Ray Serve LLM section to openai compatible server guide (#20595)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-07 20:08:05 -07:00 |
|
Ricardo Decal
|
e60d422f19
|
[Docs] Improve docstring for ray data llm example (#20597)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-07 20:06:26 -07:00 |
|
Ricardo Decal
|
0d914c81a2
|
[Docs] Rewrite offline inference guide (#20594)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-07 20:06:02 -07:00 |
|
Harry Mellor
|
6e428cdd7a
|
[Doc] Syntax highlight request responses as JSON instead of bash (#20582)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-07 20:02:45 -07:00 |
|
Chauncey
|
93b9d9f499
|
[Bugfix]: Fix messy code when using logprobs (#19209)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-08 11:02:15 +08:00 |
|
Harry Mellor
|
af107d5a0e
|
Make distinct code and console admonitions so readers are less likely to miss them (#20585)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-07 19:55:28 -07:00 |
|
Woosuk Kwon
|
31c5d0a1b7
|
[Optimize] Don't send token ids when kv connector is not used (#20586)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-07 19:04:54 -07:00 |
|
Ming Yang
|
afb7cff1b9
|
[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe (#20167)
Signed-off-by: Ming Yang <yming@meta.com>
|
2025-07-08 01:07:22 +00:00 |
|
Kyle Yu
|
d2e841a10a
|
[Misc] Improve logging for dynamic shape cache compilation (#20573)
Signed-off-by: kyolebu <kyu@redhat.com>
|
2025-07-08 00:48:09 +00:00 |
|
Patrick von Platen
|
14601f5fba
|
[Config] Refactor mistral configs (#20570)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-07-07 15:25:10 -07:00 |
|
Harry Mellor
|
042d131f39
|
Fix links in multi-modal model contributing page (#18615)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-07 21:13:52 +00:00 |
|
rongfu.leng
|
8e807cdfa4
|
[Misc] feat output content in stream response (#19608)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-07-07 20:45:10 +00:00 |
|
Anton
|
e601efcb10
|
[Misc] Add fully interleaved support for multimodal 'string' content format (#14047)
Signed-off-by: drobyshev.anton <drobyshev.anton@wb.ru>
Co-authored-by: drobyshev.anton <drobyshev.anton@wb.ru>
|
2025-07-07 19:43:08 +00:00 |
|
jvlunteren
|
22dd9c2730
|
[Kernel] Optimize Prefill Attention in Unified Triton Attention Kernel (#20308)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2025-07-07 19:08:12 +00:00 |
|
Rui Qiao
|
a6d795d593
|
[DP] Copy environment variables to Ray DPEngineCoreActors (#20344)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-07 10:14:22 -07:00 |
|
ztang2370
|
a37d75bbec
|
[Front-end] microbatch tokenization (#19334)
Signed-off-by: zt2370 <ztang2370@gmail.com>
|
2025-07-07 17:54:10 +01:00 |
|
Peter Pan
|
edd270bc78
|
[Bugfix] Prevent IndexError for cached requests when pipeline parallelism is disabled (#20486)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-07 09:41:15 -07:00 |
|
wang.yuqi
|
110df74332
|
[Model][Last/4] Automatic conversion of CrossEncoding model (#19675)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-07 14:46:04 +00:00 |
|
Harry Mellor
|
1ad69e8375
|
[Doc] Fix some MkDocs snippets used in the installation docs (#20572)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-07 07:44:34 -07:00 |
|
Harry Mellor
|
b8a498c9b2
|
[Doc] Add outline for content tabs (#20571)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-07 07:43:26 -07:00 |
|
Harry Mellor
|
923147b5e8
|
[Doc] Fix internal links so they don't always point to latest (#20563)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-07 04:15:50 -07:00 |
|
Harry Mellor
|
45877ef740
|
[Doc] Use gh-pr and gh-issue everywhere we can in the docs (#20564)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-07 03:54:22 -07:00 |
|
Harry Mellor
|
6e4bef1bea
|
[Doc] Remove extra whitespace from CI failures doc (#20565)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-07 03:35:47 -07:00 |
|
Jee Jee Li
|
4ff79a136e
|
[Misc] Set the minimum openai version (#20539)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-07 09:15:26 +00:00 |
|
Abirdcfly
|
448acad31e
|
[Misc] remove unused jinaai_serving_reranking (#18878)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>
|
2025-07-07 09:14:12 +00:00 |
|
Michael Yao
|
eb0b2d2f08
|
[Docs] Clean up tables in supported_models.md (#20552)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-07-07 01:46:31 -07:00 |
|
Yan Ma
|
3112271f6e
|
[XPU] log clean up for XPU platform (#20553)
Signed-off-by: yan <yan.ma@intel.com>
|
2025-07-07 01:38:22 -07:00 |
|
Michael Yao
|
1fd471e957
|
Add docstrings to url_schemes.py to improve readability (#20545)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-07-07 08:31:49 +00:00 |
|
Liangliang Ma
|
2c5ebec064
|
[XPU][CI] add v1/core test in xpu hardware ci (#20537)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
|
2025-07-07 01:16:40 -07:00 |
|
Jee Jee Li
|
2e610deb72
|
[CI/Build] Enable phi2 lora test (#20540)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-07 05:10:41 +00:00 |
|
Yang Yang
|
6e2c19ce22
|
[Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU (#19410)
Signed-off-by: dbyoung18 <yang5.yang@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-07-07 04:32:32 +00:00 |
|
Reid
|
47db8c2c15
|
[Misc] add a tip for pre-commit (#20536)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-06 19:42:06 -07:00 |
|
Woosuk Kwon
|
462b269280
|
Implement OpenAI Responses API [1/N] (#20504)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-06 18:32:13 -07:00 |
|
Cyrus Leung
|
c18b3b8e8b
|
[Bugfix] Add use_cross_encoder flag to use correct activation in ClassifierPooler (#20527)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-06 14:01:48 -07:00 |
|
Woosuk Kwon
|
9528e3a05e
|
[BugFix][Spec Decode] Fix spec token ids in model runner (#20530)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-06 19:44:52 +00:00 |
|
Cyrus Leung
|
9fb52e523a
|
[V1] Support any head size for FlexAttention backend (#20467)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-06 09:54:36 -07:00 |
|
Woosuk Kwon
|
e202dd2736
|
[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-07-06 08:48:13 -07:00 |
|
Reid
|
43813e6361
|
[Misc] call the pre-defined func (#20518)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-06 10:25:29 +00:00 |
|
Brayden Zhong
|
cede942b87
|
[Benchmark] Add support for multiple batch size benchmark through CLI in benchmark_moe.py (#20516)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-07-06 09:20:11 +00:00 |
|
Flora Feng
|
fe1e924811
|
[Frontend] Support image object in llm.chat (#19635)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
|
2025-07-06 06:47:13 +00:00 |
|
Chengji Yao
|
4548c03c50
|
[TPU][Bugfix] fix the MoE OOM issue (#20339)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-05 21:19:09 -07:00 |
|
Lucas Wilkinson
|
40b86aa05e
|
[BugFix] Fix: ImportError when building on hopper systems (#20513)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-07-06 12:17:30 +08:00 |
|
Lucia Fang
|
432870829d
|
[Bugfix] Fix missing per_act_token parameter in compressed_tensors_moe (#20509)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-07-06 12:08:30 +08:00 |
|
Vadim Gimpelson
|
f73d02aadc
|
[BUG] Fix #20484. Support empty sequence in cuda penalty kernel (#20491)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
|
2025-07-05 19:38:02 -07:00 |
|
Jeremy Reizenstein
|
c5ebe040ac
|
test_attention compat with coming xformers change (#20487)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-05 19:37:59 -07:00 |
|
Reid
|
8d763cb891
|
[Misc] remove unused import (#20517)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-05 19:17:06 -07:00 |
|