3384 Commits

Author SHA1 Message Date
Aleksandr Malyshev
812c981fa0
Splitting attention kernel file (#10091)
Signed-off-by: maleksan85 <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2024-11-11 22:55:07 -08:00
Jee Jee Li
7f5edb5900
[Misc][LoRA] Replace hardcoded cuda device with configurable argument (#10223)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-12 11:10:15 +08:00
youkaichao
eea55cca5b
[1/N] torch.compile user interface design (#10237)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 18:01:06 -08:00
Russell Bryant
9cdba9669c
[Doc] Update help text for --distributed-executor-backend (#10231)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-12 09:55:09 +08:00
youkaichao
d1c6799b88
[doc] update debugging guide (#10236)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 15:21:12 -08:00
Robert Shaw
6ace6fba2c
[V1] AsyncLLM Implementation (#9826)
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-11-11 23:05:38 +00:00
Nikolai Shcheglov
08f93e7439
Make shutil rename in python_only_dev (#10233)
Signed-off-by: shcheglovnd <shcheglovnd@avride.ai>
2024-11-11 14:29:19 -08:00
Woosuk Kwon
9d5b4e4dea
[V1] Enable custom ops with piecewise CUDA graphs (#10228)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-11 11:58:07 -08:00
youkaichao
8a7fe47d32
[misc][distributed] auto port selection and disable tests (#10226)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 11:54:59 -08:00
Yuan Tang
4800339c62
Add docs on serving with Llama Stack (#10183)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2024-11-11 11:28:55 -08:00
Woosuk Kwon
fe15729a2b
[V1] Use custom ops for piecewise CUDA graphs (#10227)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-11 11:26:48 -08:00
youkaichao
330e82d34a
[v1][torch.compile] support managing cudagraph buffer (#10203)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-11 11:10:27 -08:00
Woosuk Kwon
d7a4f2207b
[V1] Do not use inductor for piecewise CUDA graphs (#10225)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-11 11:05:57 -08:00
Woosuk Kwon
f9dadfbee3
[V1] Fix detokenizer ports (#10224)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-11 10:42:07 -08:00
dependabot[bot]
25144ceed0
Bump actions/setup-python from 5.2.0 to 5.3.0 (#10209)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-11 17:24:10 +00:00
youkaichao
e6de9784d2
[core][distributed] add stateless process group (#10216)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 09:02:14 -08:00
Yangcheng Li
36fc439de0
[Doc] fix doc string typo in block_manager swap_out function (#10212) 2024-11-11 08:53:07 -08:00
harrywu
874f551b36
[Metrics] add more metrics (#4464)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-12 00:17:38 +08:00
Isotr0py
2cebda42bb
[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner (#10218)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-11 12:37:58 +00:00
Roger Wang
5fb1f935b0
[V1] Allow tokenizer_mode and trust_remote_code for Detokenizer (#10211)
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-11-11 18:01:18 +08:00
Jee Jee Li
36e4acd02a
[LoRA][Kernel] Remove the unused libentry module (#10214)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-11 09:43:23 +00:00
Isotr0py
58170d6503
[Hardware][CPU] Add embedding models support for CPU backend (#10193)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-11 08:54:28 +00:00
dependabot[bot]
9804ac7c7c
Bump the patch-update group with 5 updates (#10210)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-11 07:22:40 +00:00
youkaichao
f89d18ff74
[6/N] pass whole config to inner model (#10205)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 06:41:46 +00:00
youkaichao
f0f2e5638e
[doc] improve debugging code (#10206)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-10 17:49:40 -08:00
yansh97
ad9a78bf64
[Doc] Fix typo error in vllm/entrypoints/openai/cli_args.py (#10196) 2024-11-11 00:14:22 +00:00
youkaichao
73b9083e99
[misc] improve cloudpickle registration and tests (#10202)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-11 00:10:53 +00:00
Shawn Du
20cf2f553c
[Misc] small fixes to function tracing file path (#9543)
Signed-off-by: Shawn Du <shawnd200@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-10 15:21:06 -08:00
Yongzao
bfb7d61a7c
[doc] Polish the integration with huggingface doc (#10195)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-10 10:22:04 -08:00
FuryMartin
19682023b6
[Doc] Fix typo error in CONTRIBUTING.md (#10190)
Signed-off-by: FuryMartin <furymartin9910@outlook.com>
2024-11-10 07:47:24 +00:00
youkaichao
9fa4bdde9d
[ci][build] limit cmake version (#10188)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 16:27:26 -08:00
Cyrus Leung
51c2e1fcef
[CI/Build] Split up models tests (#10069)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 11:39:14 -08:00
Krishna Mandal
b09895a618
[Frontend][Core] Override HF config.json via CLI (#5836)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 16:19:27 +00:00
cjackal
d88bff1b96
[Frontend] add add_request_id middleware (#9594)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2024-11-09 10:18:29 +00:00
Zhao Yingzhuo
9e37266420
bugfix: fix the bug that stream generate not work (#2756) 2024-11-09 10:09:48 +00:00
youkaichao
8a4358ecb5
[doc] explaining the integration with huggingface (#10173)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 01:02:54 -08:00
youkaichao
bd46357ad9
[bugfix] fix broken tests of mlp speculator (#10177)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 00:04:50 -08:00
bnellnm
f192aeba74
[Bugfix] Enable some fp8 and quantized fullgraph tests (#10171)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
2024-11-09 08:01:27 +00:00
Chendi.Xue
8e1529dc57
[CI/Build] Add run-hpu-test.sh script (#10167)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
2024-11-09 06:26:52 +00:00
youkaichao
1a95f10ee7
[5/N] pass the whole config to model (#9983)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 14:17:28 +08:00
Cyrus Leung
49d2a41a86
[Doc] Adjust RunLLM location (#10176)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-08 20:07:10 -08:00
Isotr0py
47672f38b5
[CI/Build] Fix VLM broadcast tests tensor_parallel_size passing (#10161)
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-09 04:02:59 +00:00
Michael Goin
f83feccd7f
[Bugfix] Ignore GPTQ quantization of Qwen2-VL visual module (#10169)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-09 03:36:46 +00:00
Cyrus Leung
e0191a95d8
[0/N] Rename MultiModalInputs to MultiModalKwargs (#10040)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 11:31:02 +08:00
Li, Jiang
d7edca1dee
[CI/Build] Adding timeout in CPU CI to avoid CPU test queue blocking (#6892)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 03:27:11 +00:00
rasmith
127c07480e
[Kernel][Triton] Add Triton implementation for scaled_mm_triton to support fp8 and int8 SmoothQuant, symmetric case (#9857)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2024-11-08 19:59:22 -05:00
bnellnm
10b67d865d
[Bugfix] SymIntArrayRef expected to contain concrete integers (#10170)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
2024-11-08 14:44:18 -08:00
Luka Govedič
4f93dfe952
[torch.compile] Fuse RMSNorm with quant (#9138)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-11-08 21:20:08 +00:00
Florian Zimmermeister
e1b5a82179
Rename vllm.logging to vllm.logging_utils (#10134) 2024-11-08 20:53:24 +00:00
Luka Govedič
87713c6053
[CI/Build] Ignore .gitignored files for shellcheck (#10162)
Signed-off-by: luka <luka@neuralmagic.com>
2024-11-08 19:53:36 +00:00