Skip to content

Releases: InternLM/lmdeploy

LMDeploy Release V0.2.4

22 Feb 03:44
24ea5dc
Compare
Choose a tag to compare

What's Changed

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

Full Changelog: v0.2.3...v0.2.4

LMDeploy Release V0.2.3

06 Feb 06:14
2831dc2
Compare
Choose a tag to compare

What's Changed

🚀 Features

💥 Improvements

  • Remove caching tokenizer.json by @grimoire in #1074
  • Refactor get_logger to remove the dependency of MMLogger from mmengine by @yinfan98 in #1064
  • Use TM_LOG_LEVEL environment variable first by @zhyncs in #1071
  • Speed up the initialization of w8a8 model for torch engine by @yinfan98 in #1088
  • Make logging.logger's behavior consistent with MMLogger by @irexyc in #1092
  • Remove owned_session for torch engine by @grimoire in #1097
  • Unify engine initialization in pipeline by @irexyc in #1085
  • Add skip_special_tokens in GenerationConfig by @grimoire in #1091
  • Use default stop words for turbomind backend in pipeline by @irexyc in #1119
  • Add input_token_len to Response and update Response document by @AllentDan in #1115

🐞 Bug fixes

  • Fix fast tokenizer swallows prefix space when there are too many white spaces by @AllentDan in #992
  • Fix turbomind CUDA runtime error invalid argument by @zhyncs in #1100
  • Add safety check for incremental decode by @AllentDan in #1094
  • Fix device type of get_ppl for turbomind by @RunningLeon in #1093
  • Fix pipeline init turbomind from workspace by @irexyc in #1126
  • Add dependency version check and fix ignore_eos logic by @grimoire in #1099
  • Change configuration_internlm.py to configuration_internlm2.py by @HIT-cwh in #1129

📚 Documentations

🌐 Other

New Contributors

Full Changelog: v0.2.2...v0.2.3

LMDeploy Release V0.2.2

31 Jan 09:57
4a28f12
Compare
Choose a tag to compare

Highlight

English version

  • The allocation strategy for k/v cache is changed. The parameter cache_max_entry_count defaults to 0.8. It means the proportion of GPU FREE memory rather than TOTAL memory. The default value is updated to 0.8. It can help prevent OOM issues.
  • The pipeline API supports streaming inference. You may give it a try!
from lmdeploy import pipeline
pipe = pipeline('internlm/internlm2-chat-7b')
for item in pipe.stream_infer('hi, please intro yourself'):
    print(item)
  • Add api key and ssl to api_server

Chinese version

  • TurboMind engine 修改了GPU memory分配策略。k/v cache 内存比例参数 cache_max_entry_count 缺省值变更为 0.8。它表示 GPU空闲内存的比例,不再是 GPU 总内存的比例。
  • Pipeline 支持流式输出接口。可以尝试下如下代码:
from lmdeploy import pipeline
pipe = pipeline('internlm/internlm2-chat-7b')
for item in pipe.stream_infer('hi, please intro yourself'):
    print(item)
  • api_server 在接口中增加了 api_key

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Full Changelog: v0.2.1...v0.2.2

LMDeploy Release V0.2.1

19 Jan 10:38
e96e2b4
Compare
Choose a tag to compare

What's Changed

💥 Improvements

🐞 Bug fixes

📚 Documentations

  • add guide about installation on cuda 12+ platform by @lvhan028 in #988

🌐 Other

Full Changelog: v0.2.0...v0.2.1

LMDeploy Release V0.2.0

17 Jan 02:00
b319dce
Compare
Choose a tag to compare

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Full Changelog: v0.1.0...v0.2.0

LMDeploy Release V0.1.0

18 Dec 12:10
477f2db
Compare
Choose a tag to compare

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Full Changelog: v0.0.14...v0.1.0

LMDeploy Release V0.1.0a2

06 Dec 06:50
fddad30
Compare
Choose a tag to compare

What's Changed

💥 Improvements

  • Unify prefill & decode passes by @lzhangzz in #775
  • add cuda12.1 build check ci by @irexyc in #782
  • auto upload cuda12.1 python pkg to release when create new tag by @irexyc in #784
  • Report the inference benchmark of models with different size by @lvhan028 in #794
  • Add chat template for Yi by @AllentDan in #779

🐞 Bug fixes

  • Fix early-exit condition in attention kernel by @lzhangzz in #788
  • Fix missed arguments when benchmark static inference performance by @lvhan028 in #787
  • fix extra colon in InternLMChat7B template by @C1rN09 in #796
  • Fix local kv head num by @lvhan028 in #806

📚 Documentations

🌐 Other

New Contributors

Full Changelog: v0.1.0a1...v0.1.0a2

LMDeploy Release V0.1.0a1

29 Nov 13:51
9c46b27
Compare
Choose a tag to compare

What's Changed

💥 Improvements

🐞 Bug fixes

  • [Fix] build docker image failed since packaging is missing by @lvhan028 in #753
  • [Fix] Rollback the data type of input_ids to TYPE_UINT32 in preprocessor's proto by @lvhan028 in #758
  • fix turbomind build on sm<80 by @grimoire in #754
  • fix typo by @grimoire in #769

🌐 Other

Full Changelog: v0.1.0a0...v0.1.0a1

LMDeploy Release V0.1.0a0

23 Nov 13:05
a7c5007
Compare
Choose a tag to compare

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Full Changelog: v0.0.14...v0.1.0a0

LMDeploy Release V0.0.14

09 Nov 12:13
7b20cfd
Compare
Choose a tag to compare

What's Changed

💥 Improvements

🐞 Bug fixes

  • [Fix] Qwen's quantization results are abnormal & Baichuan cannot be quantized by @pppppM in #605
  • FIX: fix stop_session func bug by @yunzhongyan0 in #578
  • fix benchmark serving computation mistake by @AllentDan in #630
  • fix Tokenizer load error when the path of the being-converted model is not writable by @irexyc in #669
  • fix tokenizer_info when convert the model by @irexyc in #661

🌐 Other

New Contributors

Full Changelog: v0.0.13...v0.0.14