xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-18 19:37:15 +08:00

Author	SHA1	Message	Date
Chenyaaang	e34d130c16	[TPU] Temporary fix vmem oom for long model len by reducing page size (#20278 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-07-08 05:16:16 +00:00
Li, Jiang	7721ef1786	[CI/Build][CPU] Fix CPU CI and remove all CPU V0 files (#20560 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-07 22:13:44 -07:00
Reid	8369b7c2a9	[Misc] improve error msg (#20604 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-07 21:45:18 -07:00
Ricardo Decal	3eb4ad53f3	[Docs] Add Anyscale to frameworks (#20590 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-07 20:09:13 -07:00
Ricardo Decal	90a2769f20	[Docs] Add Ray Serve LLM section to openai compatible server guide (#20595 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-07 20:08:05 -07:00
Ricardo Decal	e60d422f19	[Docs] Improve docstring for ray data llm example (#20597 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-07 20:06:26 -07:00
Ricardo Decal	0d914c81a2	[Docs] Rewrite offline inference guide (#20594 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-07-07 20:06:02 -07:00
Harry Mellor	6e428cdd7a	[Doc] Syntax highlight request responses as JSON instead of bash (#20582 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-07 20:02:45 -07:00
Chauncey	93b9d9f499	[Bugfix]: Fix messy code when using logprobs (#19209 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-08 11:02:15 +08:00
Harry Mellor	af107d5a0e	Make distinct `code` and `console` admonitions so readers are less likely to miss them (#20585 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-07 19:55:28 -07:00
Woosuk Kwon	31c5d0a1b7	[Optimize] Don't send token ids when kv connector is not used (#20586 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-07 19:04:54 -07:00
Ming Yang	afb7cff1b9	[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe (#20167 ) Signed-off-by: Ming Yang <yming@meta.com>	2025-07-08 01:07:22 +00:00
Kyle Yu	d2e841a10a	[Misc] Improve logging for dynamic shape cache compilation (#20573 ) Signed-off-by: kyolebu <kyu@redhat.com>	2025-07-08 00:48:09 +00:00
Patrick von Platen	14601f5fba	[Config] Refactor mistral configs (#20570 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>	2025-07-07 15:25:10 -07:00
Harry Mellor	042d131f39	Fix links in multi-modal model contributing page (#18615 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-07 21:13:52 +00:00
rongfu.leng	8e807cdfa4	[Misc] feat output content in stream response (#19608 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-07-07 20:45:10 +00:00
Anton	e601efcb10	[Misc] Add fully interleaved support for multimodal 'string' content format (#14047 ) Signed-off-by: drobyshev.anton <drobyshev.anton@wb.ru> Co-authored-by: drobyshev.anton <drobyshev.anton@wb.ru>	2025-07-07 19:43:08 +00:00
jvlunteren	22dd9c2730	[Kernel] Optimize Prefill Attention in Unified Triton Attention Kernel (#20308 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>	2025-07-07 19:08:12 +00:00
Rui Qiao	a6d795d593	[DP] Copy environment variables to Ray DPEngineCoreActors (#20344 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-07-07 10:14:22 -07:00
ztang2370	a37d75bbec	[Front-end] microbatch tokenization (#19334 ) Signed-off-by: zt2370 <ztang2370@gmail.com>	2025-07-07 17:54:10 +01:00
Peter Pan	edd270bc78	[Bugfix] Prevent IndexError for cached requests when pipeline parallelism is disabled (#20486 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2025-07-07 09:41:15 -07:00
wang.yuqi	110df74332	[Model][Last/4] Automatic conversion of CrossEncoding model (#19675 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-07-07 14:46:04 +00:00
Harry Mellor	1ad69e8375	[Doc] Fix some MkDocs snippets used in the installation docs (#20572 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-07 07:44:34 -07:00
Harry Mellor	b8a498c9b2	[Doc] Add outline for content tabs (#20571 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-07 07:43:26 -07:00
Harry Mellor	923147b5e8	[Doc] Fix internal links so they don't always point to latest (#20563 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-07 04:15:50 -07:00
Harry Mellor	45877ef740	[Doc] Use `gh-pr` and `gh-issue` everywhere we can in the docs (#20564 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-07 03:54:22 -07:00
Harry Mellor	6e4bef1bea	[Doc] Remove extra whitespace from CI failures doc (#20565 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-07 03:35:47 -07:00
Jee Jee Li	4ff79a136e	[Misc] Set the minimum openai version (#20539 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-07 09:15:26 +00:00
Abirdcfly	448acad31e	[Misc] remove unused jinaai_serving_reranking (#18878 ) Signed-off-by: Abirdcfly <fp544037857@gmail.com>	2025-07-07 09:14:12 +00:00
Michael Yao	eb0b2d2f08	[Docs] Clean up tables in supported_models.md (#20552 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-07-07 01:46:31 -07:00
Yan Ma	3112271f6e	[XPU] log clean up for XPU platform (#20553 ) Signed-off-by: yan <yan.ma@intel.com>	2025-07-07 01:38:22 -07:00
Michael Yao	1fd471e957	Add docstrings to url_schemes.py to improve readability (#20545 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-07-07 08:31:49 +00:00
Liangliang Ma	2c5ebec064	[XPU][CI] add v1/core test in xpu hardware ci (#20537 ) Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>	2025-07-07 01:16:40 -07:00
Jee Jee Li	2e610deb72	[CI/Build] Enable phi2 lora test (#20540 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-07 05:10:41 +00:00
Yang Yang	6e2c19ce22	[Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU (#19410 ) Signed-off-by: dbyoung18 <yang5.yang@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-07-07 04:32:32 +00:00
Reid	47db8c2c15	[Misc] add a tip for pre-commit (#20536 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-06 19:42:06 -07:00
Woosuk Kwon	462b269280	Implement OpenAI Responses API [1/N] (#20504 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-06 18:32:13 -07:00
Cyrus Leung	c18b3b8e8b	[Bugfix] Add `use_cross_encoder` flag to use correct activation in `ClassifierPooler` (#20527 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-06 14:01:48 -07:00
Woosuk Kwon	9528e3a05e	[BugFix][Spec Decode] Fix spec token ids in model runner (#20530 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-06 19:44:52 +00:00
Cyrus Leung	9fb52e523a	[V1] Support any head size for FlexAttention backend (#20467 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-06 09:54:36 -07:00
Woosuk Kwon	e202dd2736	[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-07-06 08:48:13 -07:00
Reid	43813e6361	[Misc] call the pre-defined func (#20518 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-06 10:25:29 +00:00
Brayden Zhong	cede942b87	[Benchmark] Add support for multiple batch size benchmark through CLI in `benchmark_moe.py` (#20516 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-07-06 09:20:11 +00:00
Flora Feng	fe1e924811	[Frontend] Support image object in llm.chat (#19635 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Flora Feng <4florafeng@gmail.com>	2025-07-06 06:47:13 +00:00
Chengji Yao	4548c03c50	[TPU][Bugfix] fix the MoE OOM issue (#20339 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-07-05 21:19:09 -07:00
Lucas Wilkinson	40b86aa05e	[BugFix] Fix: ImportError when building on hopper systems (#20513 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-07-06 12:17:30 +08:00
Lucia Fang	432870829d	[Bugfix] Fix missing per_act_token parameter in compressed_tensors_moe (#20509 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-07-06 12:08:30 +08:00
Vadim Gimpelson	f73d02aadc	[BUG] Fix #20484 . Support empty sequence in cuda penalty kernel (#20491 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>	2025-07-05 19:38:02 -07:00
Jeremy Reizenstein	c5ebe040ac	test_attention compat with coming xformers change (#20487 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-05 19:37:59 -07:00
Reid	8d763cb891	[Misc] remove unused import (#20517 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-05 19:17:06 -07:00

1 2 3 4 5 ...

7638 Commits