Yanyi Liu
a79e522984
[Model] Support pp for qwen2-vl ( #8696 )
2024-09-23 13:46:59 +00:00
Alex Brooks
9b8c8ba119
[Core][Frontend] Support Passing Multimodal Processor Kwargs ( #8657 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-09-23 07:44:48 +00:00
Lily Liu
c6bd70d772
[SpecDec][Misc] Cleanup, remove bonus token logic. ( #8701 )
2024-09-22 12:34:14 -07:00
litianjian
5b59532760
[Model][VLM] Add LLaVA-Onevision model support ( #8486 )
...
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-22 10:51:44 -07:00
youkaichao
0faab90eb0
[beam search] add output for manually checking the correctness ( #8684 )
2024-09-20 19:55:33 -07:00
Cyrus Leung
0455c46ed4
[Core] Factor out common code in SequenceData and Sequence ( #8675 )
2024-09-21 02:30:39 +00:00
Cyrus Leung
0057894ef7
[Core] Rename PromptInputs and inputs( #8673 )
2024-09-20 19:00:54 -07:00
Patrick von Platen
b4e4eda92e
[Bugfix][Core] Fix tekken edge case for mistral tokenizer ( #8640 )
2024-09-20 14:33:03 -07:00
Jiaxin Shan
260d40b5ea
[Core] Support Lora lineage and base model metadata management ( #6315 )
2024-09-20 06:20:56 +00:00
Charlie Fu
9cc373f390
[Kernel][Amd] Add fp8 kv cache support for rocm custom paged attention ( #8577 )
2024-09-19 17:37:57 +00:00
sroy745
3118f63385
[Bugfix] [Encoder-Decoder] Bugfix for encoder specific metadata construction during decode of encoder-decoder models. ( #8545 )
2024-09-19 02:24:15 +00:00
Tyler Michael Smith
db9120cded
[Kernel] Change interface to Mamba selective_state_update for continuous batching ( #8039 )
2024-09-18 20:05:06 +00:00
afeldman-nm
a8c1d161a7
[Core] *Prompt* logprobs support in Multi-step ( #8199 )
2024-09-18 08:38:43 -07:00
Alexander Matveev
7c7714d856
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH ( #8157 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
Aaron Pham
9d104b5beb
[CI/Build] Update Ruff version ( #8469 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-18 11:00:56 +00:00
Cyrus Leung
6ffa3f314c
[CI/Build] Avoid CUDA initialization ( #8534 )
2024-09-18 10:38:11 +00:00
Tyler Michael Smith
8110e44529
[Kernel] Change interface to Mamba causal_conv1d_update for continuous batching ( #8012 )
2024-09-17 23:44:27 +00:00
youkaichao
fa0c114fad
[doc] improve installation doc ( #8550 )
...
Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com>
2024-09-17 16:24:06 -07:00
Patrick von Platen
a54ed80249
[Model] Add mistral function calling format to all models loaded with "mistral" format ( #8515 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-17 17:50:37 +00:00
chenqianfzh
9855b99502
[Feature][kernel] tensor parallelism with bitsandbytes quantization ( #8434 )
2024-09-17 08:09:12 -07:00
sroy745
1009e93c5d
[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models ( #7631 )
2024-09-17 07:35:01 -07:00
youkaichao
99aa4eddaf
[torch.compile] register allreduce operations as custom ops ( #8526 )
2024-09-16 22:57:57 -07:00
Alex Brooks
1c1bb388e0
[Frontend] Improve Nullable kv Arg Parsing ( #8525 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-09-17 04:17:32 +00:00
Simon Mo
546034b466
[refactor] remove triton based sampler ( #8524 )
2024-09-16 20:04:48 -07:00
Luka Govedič
5d73ae49d6
[Kernel] AQ AZP 3/4: Asymmetric quantization kernels ( #7270 )
2024-09-16 11:52:40 -07:00
Nick Hill
acd5511b6d
[BugFix] Fix clean shutdown issues ( #8492 )
2024-09-16 09:33:46 -07:00
ElizaWszola
a091e2da3e
[Kernel] Enable 8-bit weights in Fused Marlin MoE ( #8032 )
...
Co-authored-by: Dipika <dipikasikka1@gmail.com>
2024-09-16 09:47:19 -06:00
Isotr0py
fc990f9795
[Bugfix][Kernel] Add IQ1_M quantization implementation to GGUF kernel ( #8357 )
2024-09-15 16:51:44 -06:00
youkaichao
47790f3e32
[torch.compile] add a flag to disable custom op ( #8488 )
2024-09-14 13:07:16 -07:00
youkaichao
a36e070dad
[torch.compile] fix functionalization ( #8480 )
2024-09-14 09:46:04 -07:00
ywfang
8a0cf1ddc3
[Model] support minicpm3 ( #8297 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Charlie Fu
1ef0d2efd0
[Kernel][Hardware][Amd]Custom paged attention kernel for rocm ( #8310 )
2024-09-13 17:01:11 -07:00
Nick Hill
18e9e1f7b3
[HotFix] Fix final output truncation with stop string + streaming ( #8468 )
2024-09-13 11:31:12 -07:00
Cyrus Leung
a84e598e21
[CI/Build] Reorganize models tests ( #7820 )
2024-09-13 10:20:06 -07:00
youkaichao
a2469127db
[misc][ci] fix quant test ( #8449 )
2024-09-13 17:20:14 +08:00
Isotr0py
9b4a3b235e
[CI/Build] Enable InternVL2 PP test only on single node ( #8437 )
2024-09-13 06:35:20 +00:00
Alexander Matveev
6821020109
[Bugfix] Fix async log stats ( #8417 )
2024-09-12 20:48:59 -07:00
Cyrus Leung
8427550488
[CI/Build] Update pixtral tests to use JSON ( #8436 )
2024-09-13 03:47:52 +00:00
shangmingc
40c396533d
[Bugfix] Mapping physical device indices for e2e test utils ( #8290 )
2024-09-13 11:06:28 +08:00
Cyrus Leung
5ec9c0fb3c
[Core] Factor out input preprocessing to a separate class ( #7329 )
2024-09-13 02:56:13 +00:00
Patrick von Platen
d31174a4e1
[Hotfix][Pixtral] Fix multiple images bugs ( #8415 )
2024-09-12 15:21:51 -07:00
Roger Wang
b61bd98f90
[CI/Build] Disable multi-node test for InternVL2 ( #8428 )
2024-09-12 15:05:35 -07:00
Nick Hill
551ce01078
[Core] Add engine option to return only deltas or final output ( #7381 )
2024-09-12 12:02:00 -07:00
William Lin
a6c0f3658d
[multi-step] add flashinfer backend ( #7928 )
2024-09-12 11:16:22 -07:00
Joe Runde
f2e263b801
[Bugfix] Offline mode fix ( #8376 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-09-12 11:11:57 -07:00
Alex Brooks
c6202daeed
[Model] Support multiple images for qwen-vl ( #8247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:54 -07:00
Isotr0py
e56bf27741
[Bugfix] Fix InternVL2 inference with various num_patches ( #8375 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:35 -07:00
youkaichao
7de49aa86c
[torch.compile] hide slicing under custom op for inductor ( #8384 )
2024-09-12 00:11:55 -07:00
youkaichao
f842a7aff1
[misc] remove engine_use_ray ( #8126 )
2024-09-11 18:23:36 -07:00
Cody Yu
a65cb16067
[MISC] Dump model runner inputs when crashing ( #8305 )
2024-09-12 01:12:25 +00:00