Shintarou Okada
|
6eca337ce0
|
Replace --expand-tools-even-if-tool-choice-none with --exclude-tools-when-tool-choice-none for v0.10.0 (#20544)
Signed-off-by: okada <kokuzen@gmail.com>
Signed-off-by: okada shintarou <okada@preferred.jp>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-24 02:56:36 -07:00 |
|
deven-labovitch
|
63d92abb7c
|
[Frontend] Set MAX_AUDIO_CLIP_FILESIZE_MB via env var instead of hardcoding (#21374)
Signed-off-by: Deven Labovitch <deven@videa.ai>
|
2025-07-23 20:22:19 -07:00 |
|
Michael Goin
|
82ec66f514
|
[V0 Deprecation] Remove Prompt Adapters (#20588)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-23 16:36:48 -07:00 |
|
Asher
|
2671334d45
|
[Model] add Hunyuan V1 Dense Model support. (#21368)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-23 03:54:08 -07:00 |
|
Michael Yao
|
2cc5016a19
|
[Docs] Clean up v1/metrics.md (#21449)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-07-23 03:37:25 -07:00 |
|
Michael Yao
|
23637dcdef
|
[Docs] Fix bullets and grammars in tool_calling.md (#21440)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-07-23 01:23:20 -07:00 |
|
Raushan Turganbay
|
f38ee34a0a
|
[feat] Enable mm caching for transformers backend (#21358)
Signed-off-by: raushan <raushan@huggingface.co>
|
2025-07-22 08:18:46 -07:00 |
|
Raghav Ravishankar
|
82b8027be6
|
Add arcee model (#21296)
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-22 00:57:43 -07:00 |
|
Li, Jiang
|
5e70dcd6e6
|
[Doc] Fix CPU doc format (#21316)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-21 21:47:49 -07:00 |
|
Li, Jiang
|
a15a50fc17
|
[CPU] Enable shared-memory based pipeline parallel for CPU backend (#21289)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-21 09:07:08 -07:00 |
|
Ning Xie
|
d97841078b
|
[Misc] unify variable for LLM instance (#20996)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-21 12:18:33 +01:00 |
|
Harry Mellor
|
e6b90a2805
|
[Docs] Make tables more space efficient in supported_models.md (#21291)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-21 02:25:02 -07:00 |
|
Harry Mellor
|
be54a951a3
|
[Docs] Fix hardcoded links in docs (#21287)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-21 02:23:57 -07:00 |
|
Cyrus Leung
|
042af0c8d3
|
[Model][1/N] Support multiple poolers at model level (#21227)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-21 02:22:21 -07:00 |
|
Raushan Turganbay
|
9499e26e2a
|
[Model] Support VLMs with transformers backend (#20543)
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-20 13:25:50 +00:00 |
|
Thomas Parnell
|
2b504eb770
|
[Docs] [V1] Update docs to remove enforce_eager limitation for hybrid models. (#21233)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-19 16:09:58 -07:00 |
|
Yuxuan Zhang
|
10eb24cc91
|
GLM-4 Update (#20736)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Lu Fang <fanglu@fb.com>
|
2025-07-19 22:40:31 +00:00 |
|
Woosuk Kwon
|
752c6ade2e
|
[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-19 13:53:17 -07:00 |
|
Jiayi Yan
|
6a971ed692
|
[Docs] Update the link to the 'Prometheus/Grafana' example (#21225)
|
2025-07-19 06:58:07 -07:00 |
|
Li, Jiang
|
e3a0e43d7f
|
[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code (#21032)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-19 05:13:55 -07:00 |
|
김종곤
|
3e04107d97
|
[Model] EXAONE 4.0 model support (#21060)
Signed-off-by: Deepfocused <rlawhdrhs27@gmail.com>
Signed-off-by: woongsik <rlawhdrhs27@gmail.com>
|
2025-07-19 14:25:44 +08:00 |
|
Jee Jee Li
|
466e878f2a
|
[Quantization] Enable BNB support for more MoE models (#21100)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-18 17:52:02 -07:00 |
|
Cyrus Leung
|
55ad648715
|
[Doc] Fix typo in model name (#21178)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-18 03:55:10 -07:00 |
|
Lucia Fang
|
b9a21e9173
|
[Docs] Update supported models documentation with missing models (#20844)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-07-17 20:12:13 -07:00 |
|
Ricardo Decal
|
c4e3b12524
|
[Docs] Add minimal demo of Ray Data API usage (#21080)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-17 20:09:19 -07:00 |
|
Jee Jee Li
|
a3a6c695f4
|
[Misc] Qwen MoE model supports LoRA (#20932)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-17 18:32:52 +00:00 |
|
Harry Mellor
|
2d6a38209b
|
[Docs] Move code block out of admonition now that it's short (#21118)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-17 06:12:29 -07:00 |
|
kYLe
|
4ef00b5cac
|
[VLM] Add Nemotron-Nano-VL-8B-V1 support (#20349)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-17 03:07:55 -07:00 |
|
Asher
|
5a7fb3ab9e
|
[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-17 09:10:09 +00:00 |
|
Zhonghua Deng
|
8a4e5c5f3c
|
[V1][P/D]Enhance Performance and code readability for P2pNcclConnector (#20906)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2025-07-16 22:13:00 -07:00 |
|
XiongfeiWei
|
58760e12b1
|
[TPU] Start using python 3.12 (#21000)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-07-16 19:37:44 -07:00 |
|
Nir David
|
01513a334a
|
Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) (#12010)
Signed-off-by: Nir David <ndavid@habana.ai>
Signed-off-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>
|
2025-07-16 15:33:41 -04:00 |
|
Michael Yao
|
260127ea54
|
[Docs] Add intro and fix 1-2-3 list in frameworks/open-webui.md (#19199)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-07-16 06:11:38 -07:00 |
|
Peter Pan
|
1eb2b9c102
|
[CI] update typos config for CI pre-commit and fix some spells (#20919)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-15 21:12:40 -07:00 |
|
Ricardo Decal
|
3ed94f9d0a
|
[Docs] Enhance Anyscale documentation, add quickstart links for vLLM (#21018)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-15 19:46:56 -07:00 |
|
Harry Mellor
|
b637e9dcb8
|
Add full serve CLI reference back to docs (#20978)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 17:42:30 +00:00 |
|
Harry Mellor
|
313ae8c16a
|
[Deprecation] Remove everything scheduled for removal in v0.10.0 (#20979)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 15:57:53 +00:00 |
|
Harry Mellor
|
56fe4bedd6
|
[Deprecation] Remove TokenizerPoolConfig (#20968)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-15 14:00:50 +00:00 |
|
Rui Qiao
|
d91278181d
|
[doc] Add more details for Ray-based DP (#20948)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-15 05:37:12 -07:00 |
|
Thomas Parnell
|
3534c39a20
|
[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-07-15 04:04:35 -07:00 |
|
Reid
|
68d28e37b0
|
[frontend] Add --help=page option for paginated help output (#20961)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-15 00:42:00 -07:00 |
|
Isotr0py
|
fc017915f5
|
[Doc] Clearer mistral3 and pixtral model support description (#20926)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-14 21:56:53 -07:00 |
|
Ricardo Decal
|
054c8657e3
|
[Docs] Add Kuberay to deployment integrations (#20592)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
|
2025-07-14 20:13:55 -07:00 |
|
ant-yy
|
38efa28278
|
[Model] Add Ling implementation (#20680)
Signed-off-by: vito.yy <vito.yy@antgroup.com>
|
2025-07-14 22:10:32 +08:00 |
|
Reid
|
a86754a12b
|
[docs] convert supported configs to table (#20858)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-12 06:54:50 -07:00 |
|
Congcong Chen
|
2c11a738b3
|
[Model] New model support for microsoft/Phi-4-mini-flash-reasoning (#20702)
Signed-off-by: Congcong Chen <congcongchen@microsoft.com>
|
2025-07-12 06:02:10 -07:00 |
|
Lucia Fang
|
fb25e95688
|
[Docs] Update basic.md (#20846)
|
2025-07-11 23:05:32 -07:00 |
|
bigmoyan
|
5f0af36af5
|
Update kimi-k2 tool calling docs, enable unit tests (#20821)
Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
|
2025-07-11 20:16:14 +00:00 |
|
Nick Hill
|
9907fc4494
|
[Docs] Data Parallel deployment documentation (#20768)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-11 09:42:10 -07:00 |
|
Reid
|
6fb162447b
|
[doc] fix ordered list issue (#20819)
Signed-off-by: reidliu41 <reid201711@gmail.com>
|
2025-07-11 06:49:46 -07:00 |
|