Skip to content

LMDeploy Release V0.5.2

Compare
Choose a tag to compare
@lvhan028 lvhan028 released this 26 Jul 08:07
· 300 commits to main since this release
7199b4e

Highlight

  • LMDeploy support Llama3.1 and its Tool Calling. An example of calling "Wolfram Alpha" to perform complex mathematical calculations can be found from here

What's Changed

🚀 Features

💥 Improvements

  • Remove the triton inference server backend "turbomind_backend" by @lvhan028 in #1986
  • Remove kv cache offline quantization by @AllentDan in #2097
  • Remove session_len and deprecated short names of the chat templates by @lvhan028 in #2105
  • clarify "n>1" in GenerationConfig hasn't been supported yet by @lvhan028 in #2108

🐞 Bug fixes

🌐 Other

Full Changelog: v0.5.1...v0.5.2