Ricardo Decal
684174115d
[Docs] Rewrite Distributed Inference and Serving guide ( #20593 )
...
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-24 08:13:05 -07:00
Sanger Steel
cdb79ee63d
[Docs] Update Tensorizer usage documentation ( #21190 )
...
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Signed-off-by: William Goldby <willgoldby@gmail.com>
Co-authored-by: William Goldby <willgoldby@gmail.com>
2025-07-24 06:56:18 -07:00
elvischenv
5a19a6c670
[Fix] Update mamba_ssm to 2.2.5 ( #21421 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-07-24 03:25:41 -07:00
Harry Mellor
13abd0eaf9
[Model] Officially support Emu3 with Transformers backend ( #21319 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-24 03:22:12 -07:00
Shintarou Okada
6eca337ce0
Replace --expand-tools-even-if-tool-choice-none with --exclude-tools-when-tool-choice-none for v0.10.0 ( #20544 )
...
Signed-off-by: okada <kokuzen@gmail.com>
Signed-off-by: okada shintarou <okada@preferred.jp>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-24 02:56:36 -07:00
deven-labovitch
63d92abb7c
[Frontend] Set MAX_AUDIO_CLIP_FILESIZE_MB via env var instead of hardcoding ( #21374 )
...
Signed-off-by: Deven Labovitch <deven@videa.ai>
2025-07-23 20:22:19 -07:00
Michael Goin
82ec66f514
[V0 Deprecation] Remove Prompt Adapters ( #20588 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-23 16:36:48 -07:00
Asher
2671334d45
[Model] add Hunyuan V1 Dense Model support. ( #21368 )
...
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
2025-07-23 03:54:08 -07:00
Michael Yao
2cc5016a19
[Docs] Clean up v1/metrics.md ( #21449 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-07-23 03:37:25 -07:00
Michael Yao
23637dcdef
[Docs] Fix bullets and grammars in tool_calling.md ( #21440 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-07-23 01:23:20 -07:00
Raushan Turganbay
f38ee34a0a
[feat] Enable mm caching for transformers backend ( #21358 )
...
Signed-off-by: raushan <raushan@huggingface.co>
2025-07-22 08:18:46 -07:00
Raghav Ravishankar
82b8027be6
Add arcee model ( #21296 )
...
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-22 00:57:43 -07:00
Li, Jiang
5e70dcd6e6
[Doc] Fix CPU doc format ( #21316 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-21 21:47:49 -07:00
Li, Jiang
a15a50fc17
[CPU] Enable shared-memory based pipeline parallel for CPU backend ( #21289 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-21 09:07:08 -07:00
Ning Xie
d97841078b
[Misc] unify variable for LLM instance ( #20996 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-07-21 12:18:33 +01:00
Harry Mellor
e6b90a2805
[Docs] Make tables more space efficient in supported_models.md ( #21291 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-21 02:25:02 -07:00
Harry Mellor
be54a951a3
[Docs] Fix hardcoded links in docs ( #21287 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-21 02:23:57 -07:00
Cyrus Leung
042af0c8d3
[Model][1/N] Support multiple poolers at model level ( #21227 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-21 02:22:21 -07:00
Raushan Turganbay
9499e26e2a
[Model] Support VLMs with transformers backend ( #20543 )
...
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-20 13:25:50 +00:00
Thomas Parnell
2b504eb770
[Docs] [V1] Update docs to remove enforce_eager limitation for hybrid models. ( #21233 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-19 16:09:58 -07:00
Yuxuan Zhang
10eb24cc91
GLM-4 Update ( #20736 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Lu Fang <fanglu@fb.com>
2025-07-19 22:40:31 +00:00
Woosuk Kwon
752c6ade2e
[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small ( #21217 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-19 13:53:17 -07:00
Jiayi Yan
6a971ed692
[Docs] Update the link to the 'Prometheus/Grafana' example ( #21225 )
2025-07-19 06:58:07 -07:00
Li, Jiang
e3a0e43d7f
[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code ( #21032 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-07-19 05:13:55 -07:00
김종곤
3e04107d97
[Model] EXAONE 4.0 model support ( #21060 )
...
Signed-off-by: Deepfocused <rlawhdrhs27@gmail.com>
Signed-off-by: woongsik <rlawhdrhs27@gmail.com>
2025-07-19 14:25:44 +08:00
Jee Jee Li
466e878f2a
[Quantization] Enable BNB support for more MoE models ( #21100 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-18 17:52:02 -07:00
Cyrus Leung
55ad648715
[Doc] Fix typo in model name ( #21178 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-18 03:55:10 -07:00
Lucia Fang
b9a21e9173
[Docs] Update supported models documentation with missing models ( #20844 )
...
Signed-off-by: Lu Fang <fanglu@fb.com>
2025-07-17 20:12:13 -07:00
Ricardo Decal
c4e3b12524
[Docs] Add minimal demo of Ray Data API usage ( #21080 )
...
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-17 20:09:19 -07:00
Jee Jee Li
a3a6c695f4
[Misc] Qwen MoE model supports LoRA ( #20932 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-17 18:32:52 +00:00
Harry Mellor
2d6a38209b
[Docs] Move code block out of admonition now that it's short ( #21118 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-17 06:12:29 -07:00
kYLe
4ef00b5cac
[VLM] Add Nemotron-Nano-VL-8B-V1 support ( #20349 )
...
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-17 03:07:55 -07:00
Asher
5a7fb3ab9e
[Model] Add ToolParser and MoE Config for Hunyuan A13B ( #20820 )
...
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
2025-07-17 09:10:09 +00:00
Zhonghua Deng
8a4e5c5f3c
[V1][P/D]Enhance Performance and code readability for P2pNcclConnector ( #20906 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
2025-07-16 22:13:00 -07:00
XiongfeiWei
58760e12b1
[TPU] Start using python 3.12 ( #21000 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-16 19:37:44 -07:00
Nir David
01513a334a
Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) ( #12010 )
...
Signed-off-by: Nir David <ndavid@habana.ai>
Signed-off-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>
2025-07-16 15:33:41 -04:00
Michael Yao
260127ea54
[Docs] Add intro and fix 1-2-3 list in frameworks/open-webui.md ( #19199 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-07-16 06:11:38 -07:00
Peter Pan
1eb2b9c102
[CI] update typos config for CI pre-commit and fix some spells ( #20919 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-07-15 21:12:40 -07:00
Ricardo Decal
3ed94f9d0a
[Docs] Enhance Anyscale documentation, add quickstart links for vLLM ( #21018 )
...
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-15 19:46:56 -07:00
Harry Mellor
b637e9dcb8
Add full serve CLI reference back to docs ( #20978 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 17:42:30 +00:00
Harry Mellor
313ae8c16a
[Deprecation] Remove everything scheduled for removal in v0.10.0 ( #20979 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 15:57:53 +00:00
Harry Mellor
56fe4bedd6
[Deprecation] Remove TokenizerPoolConfig ( #20968 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 14:00:50 +00:00
Rui Qiao
d91278181d
[doc] Add more details for Ray-based DP ( #20948 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-15 05:37:12 -07:00
Thomas Parnell
3534c39a20
[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli ( #20840 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-15 04:04:35 -07:00
Reid
68d28e37b0
[frontend] Add --help=page option for paginated help output ( #20961 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-15 00:42:00 -07:00
Isotr0py
fc017915f5
[Doc] Clearer mistral3 and pixtral model support description ( #20926 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-14 21:56:53 -07:00
Ricardo Decal
054c8657e3
[Docs] Add Kuberay to deployment integrations ( #20592 )
...
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-14 20:13:55 -07:00
ant-yy
38efa28278
[Model] Add Ling implementation ( #20680 )
...
Signed-off-by: vito.yy <vito.yy@antgroup.com>
2025-07-14 22:10:32 +08:00
Reid
a86754a12b
[docs] convert supported configs to table ( #20858 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-12 06:54:50 -07:00
Congcong Chen
2c11a738b3
[Model] New model support for microsoft/Phi-4-mini-flash-reasoning ( #20702 )
...
Signed-off-by: Congcong Chen <congcongchen@microsoft.com>
2025-07-12 06:02:10 -07:00