xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-29 05:07:13 +08:00

Author	SHA1	Message	Date
Joe Runde	ef7faad1b8	🐛 Fixup more test failures from memory profiling (#9563 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-21 17:10:56 -07:00
Kuntai Du	575dcebe9a	[CI] Make format checker error message more user-friendly by using emoji (#9564 ) This PR makes format checker error message more user-friendly by adding emojis.	2024-10-21 23:45:15 +00:00
Wallas Henrique	711f3a7806	[Frontend] Don't log duplicate error stacktrace for every request in the batch (#9023 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-21 14:49:41 -07:00
Nick Hill	15713e3b75	[BugFix] Update draft model TP size check to allow matching target TP size (#9394 ) Co-authored-by: Baoyuan Qi <qibaoyuan@126.com>	2024-10-21 14:14:29 -07:00
youkaichao	d621c43df7	[doc] fix format (#9562 )	2024-10-21 13:54:57 -07:00
Nick Hill	9d9186be97	[Frontend] Reduce frequency of client cancellation checking (#7959 )	2024-10-21 13:28:10 -07:00
Michael Goin	5241aa1494	[Model][Bugfix] Fix batching with multi-image in PixtralHF (#9518 )	2024-10-21 14:20:07 -04:00
Varad Ahirwadkar	ec6bd6c4c6	[BugFix] Use correct python3 binary in Docker.ppc64le entrypoint (#9492 ) Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com>	2024-10-21 17:43:02 +00:00
yudian0504	8ca8954841	[Bugfix][Misc]: fix graph capture for decoder (#9549 )	2024-10-21 17:33:30 +00:00
Dhia Eddine Rhaiem	f6b97293aa	[Model] FalconMamba Support (#9325 )	2024-10-21 12:50:16 -04:00
Thomas Parnell	496e991da8	[Doc] Consistent naming of attention backends (#9498 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-10-21 22:29:57 +08:00
Cyrus Leung	696b01af8f	[CI/Build] Split up decoder-only LM tests (#9488 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-20 21:27:50 -07:00
Andy Dai	855e0e6f97	[Frontend][Misc] Goodput metric support (#9338 )	2024-10-20 18:39:32 +00:00
Chen Zhang	4fa3e33349	[Kernel] Support sliding window in flash attention backend (#9403 )	2024-10-20 10:57:52 -07:00
Michael Goin	962d2c6349	[Model][Pixtral] Use memory_efficient_attention for PixtralHFVision (#9520 )	2024-10-20 05:29:14 +00:00
Chen Zhang	5b59fe0f08	[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger (#9530 )	2024-10-20 00:05:02 +00:00
Michael Goin	8e3e7f2713	[Model][Pixtral] Optimizations for input_processor_for_pixtral_hf (#9514 )	2024-10-19 10:44:29 -04:00
Cyrus Leung	263d8ee150	[Bugfix] Fix missing task for speculative decoding (#9524 )	2024-10-19 06:49:40 +00:00
Yue Zhang	c5eea3c8ba	[Frontend] Support simpler image input format (#9478 )	2024-10-18 23:17:07 -07:00
Russell Bryant	85dc92fc98	[CI/Build] Configure matcher for actionlint workflow (#9511 ) Signed-off-by: Russell Bryant <russell.bryant@gmail.com>	2024-10-19 06:04:18 +00:00
Russell Bryant	dfd951ed9b	[CI/Build] Add error matching for ruff output (#9513 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-19 05:42:20 +00:00
Joe Runde	82c25151ec	[Doc] update gpu-memory-utilization flag docs (#9507 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-19 11:26:36 +08:00
Nick Hill	1325872ec8	[Frontend] Avoid creating guided decoding LogitsProcessor unnecessarily (#9521 )	2024-10-18 20:21:01 -07:00
Joe Runde	380e18639f	🐛 fix torch memory profiling (#9516 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-18 21:25:19 -04:00
sasha0552	337ed76671	[Bugfix] Fix offline mode when using `mistral_common` (#9457 )	2024-10-18 18:12:32 -07:00
Thomas Parnell	0c9a5258f9	[Kernel] Add env variable to force flashinfer backend to enable tensor cores (#9497 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-10-18 17:55:48 -07:00
Cody Yu	d11bf435a0	[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510 )	2024-10-18 14:30:55 -07:00
Kunjan	9bb10a7d27	[MISC] Add lora requests to metrics (#9477 ) Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal>	2024-10-18 20:50:18 +00:00
Michael Goin	3921a2f29e	[Model] Support Pixtral models in the HF Transformers format (#9036 )	2024-10-18 13:29:56 -06:00
Russell Bryant	67a7e5ef38	[CI/Build] Add error matching config for mypy (#9512 )	2024-10-18 12:17:53 -07:00
Cyrus Leung	051eaf6db3	[Model] Add user-configurable task for models that support both generation and embedding (#9424 )	2024-10-18 11:31:58 -07:00
Russell Bryant	7dbe738d65	[Misc] benchmark: Add option to set max concurrency (#9390 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-18 11:15:28 -07:00
Tyler Michael Smith	ae8b633ba3	[Bugfix] Fix offline_inference_with_prefix.py (#9505 )	2024-10-18 16:59:19 +00:00
Cyrus Leung	1bbbcc0b1d	[CI/Build] Fix lint errors in mistral tokenizer (#9504 )	2024-10-19 00:09:35 +08:00
Nick Hill	25aeb7d4c9	[BugFix] Fix and simplify completion API usage streaming (#9475 )	2024-10-18 14:10:26 +00:00
tomeras91	d2b1bf55ec	[Frontend][Feature] Add jamba tool parser (#9154 )	2024-10-18 10:27:48 +00:00
Nick Hill	1ffc8a7362	[BugFix] Typing fixes to RequestOutput.prompt and beam search (#9473 )	2024-10-18 07:19:53 +00:00
Russell Bryant	944dd8edaf	[CI/Build] Use commit hash references for github actions (#9430 )	2024-10-17 21:54:58 -07:00
Haoyu Wang	154a8ae880	[Qwen2.5] Support bnb quant for Qwen2.5 (#9467 )	2024-10-18 04:40:14 +00:00
Joe Runde	de4008e2ab	[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-17 22:47:27 -04:00
Dipika Sikka	48138a8415	[BugFix] Stop silent failures on compressed-tensors parsing (#9381 )	2024-10-17 18:54:00 -07:00
Robert Shaw	343f8e0905	Support `BERTModel` (first `encoder-only` embedding model) (#9056 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: laishzh <laishengzhang@gmail.com> Co-authored-by: Max de Bayser <maxdebayser@gmail.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-10-17 23:21:01 +00:00
Shashwat Srijan	bb76538bbd	[Hardwware][Neuron] Simplify model load for transformers-neuronx library (#9380 )	2024-10-17 15:39:39 -07:00
sasha0552	d615b5c9f8	[Bugfix] Print warnings related to `mistral_common` tokenizer only once (#9468 )	2024-10-17 21:44:20 +00:00
Kai Wu	d65049daab	[Bugfix] Add random_seed to sample_hf_requests in benchmark_serving script (#9013 ) Co-authored-by: Isotr0py <2037008807@qq.com>	2024-10-17 21:11:11 +00:00
bnellnm	eca2c5f7c0	[Bugfix] Fix support for dimension like integers and ScalarType (#9299 )	2024-10-17 19:08:34 +00:00
Luka Govedič	0f41fbe5a3	[torch.compile] Fine-grained CustomOp enabling mechanism (#9300 )	2024-10-17 18:36:37 +00:00
Cyrus Leung	7871659abb	[Misc] Remove commit id file (#9470 )	2024-10-17 10:34:37 -07:00
Daniele	a2c71c5405	[CI/Build] remove .github from .dockerignore, add dirty repo check (#9375 ) v0.6.3.post1	2024-10-17 10:25:06 -07:00
Kuntai Du	81ede99ca4	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).	2024-10-17 11:38:15 -05:00

1 2 3 4 5 ...

3070 Commits