Release LMDeploy Release V0.1.0 · InternLM/lmdeploy

What's Changed

🚀 Features

Add extra_requires to reduce dependencies by @RunningLeon in #580
TurboMind 2 by @lzhangzz in #590
Support loading hf model directly by @irexyc in #685
convert model with hf repo_id by @irexyc in #774
Support turbomind bf16 by @grimoire in #803
support image_embs input by @irexyc in #799
Add api.py by @AllentDan in #805

💥 Improvements

Fix Tokenizer encode by @AllentDan in #645
Optimize for throughput by @lzhangzz in #701
Replace mmengine with mmengine-lite by @zhouzaida in #715
Set the default value of max_context_token_num 1 by @lvhan028 in #761
add triton server test and workflow yml by @RunningLeon in #760
improvement(build): enable ninja and gold linker by @tpoisonooo in #767
Report first-token-latency and token-latency percentiles by @lvhan028 in #736
Unify prefill & decode passes by @lzhangzz in #775
add cuda12.1 build check ci by @irexyc in #782
auto upload cuda12.1 python pkg to release when create new tag by @irexyc in #784
Report the inference benchmark of models with different size by @lvhan028 in #794
Simplify block manager by @lzhangzz in #812
Disable attention mask when it is not needed by @lzhangzz in #813
FIFO pipe strategy for api_server by @AllentDan in #795
simplify the header of the benchmark table by @lvhan028 in #820
add encode for opencompass by @AllentDan in #828
fix: awq should save bin files by @hscspring in #793
Support building docker image manually in CI by @RunningLeon in #825

🐞 Bug fixes

Fix init of batch state by @lzhangzz in #682
fix turbomind stream canceling by @grimoire in #686
[Fix] Fix load_checkpoint_in_model bug by @HIT-cwh in #690
Fix wrong eos_id and bos_id obtained through grpc api by @lvhan028 in #644
Fix cache/output length calculation by @lzhangzz in #738
[Fix] Skip empty batch by @lzhangzz in #747
[Fix] build docker image failed since packaging is missing by @lvhan028 in #753
[Fix] Rollback the data type of input_ids to TYPE_UINT32 in preprocessor's proto by @lvhan028 in #758
fix turbomind build on sm<80 by @grimoire in #754
Fix early-exit condition in attention kernel by @lzhangzz in #788
Fix missed arguments when benchmark static inference performance by @lvhan028 in #787
fix extra colon in InternLMChat7B template by @C1rN09 in #796
Fix local kv head num by @lvhan028 in #806
Fix out-of-bound access by @lzhangzz in #809
Set smem size for repetition penalty kernel by @lzhangzz in #818
Fix cache verification by @lzhangzz in #821
fix finish_reason by @AllentDan in #816
fix turbomind awq by @grimoire in #847
Fix stop requests by await before turbomind queue.get() by @AllentDan in #850
[Fix] Fix meta tensor error by @pppppM in #848
Fix cuda reinitialization in a multiprocessing setting by @grimoire in #862
launch gradio server directly with hf model by @AllentDan in #856
fix typo by @grimoire in #769
Add chat template for Yi by @AllentDan in #779
fix api_server stop_session and end_session by @AllentDan in #835
Return the iterator after erasing it from a map by @irexyc in #864

📚 Documentations

[Docs] Update Supported Matrix by @pppppM in #679
[Docs] Update KV8 Docs by @pppppM in #681
[Doc] Update restful api doc by @AllentDan in #662
Check-in user guide about turbomind config by @lvhan028 in #680
Update benchmark user guide by @lvhan028 in #763
[Docs] Fix typo in restful_api user guide by @maxchiron in #858
[Docs] Fix typo in restful_api user guide by @maxchiron in #859

🌐 Other

bump version to v0.1.0a0 by @lvhan028 in #709
bump version to 0.1.0a1 by @lvhan028 in #776
bump version to v0.1.0a2 by @lvhan028 in #807
bump version to v0.1.0 by @lvhan028 in #834

New Contributors

@zhouzaida made their first contribution in #715
@C1rN09 made their first contribution in #796
@maxchiron made their first contribution in #858

Full Changelog: v0.0.14...v0.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LMDeploy Release V0.1.0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors