Release LMDeploy Release v0.7.0 · InternLM/lmdeploy

What's Changed

Optimize awq kernel in pytorch engine by @grimoire in #2965
Support fp8 w8a8 for pt backend by @RunningLeon in #2959
Optimize lora kernel by @grimoire in #2975
Remove threadsafe by @grimoire in #2907
Refactor async engine & turbomind IO by @lzhangzz in #2968
[dlinfer]rope refine by @JackWeiw in #2984
Expose spaces_between_special_tokens by @AllentDan in #2991
[dlinfer]change llm op interface of paged_prefill_attention. by @JackWeiw in #2977
Update request logger by @lvhan028 in #2981
remove decoding by @grimoire in #3016

Fix build crash in nvcr.io/nvidia/pytorch:24.06-py3 image by @zgjja in #2964
add tool role in BaseChatTemplate as tool response in messages by @AllentDan in #2979
Fix ascend dockerfile by @jinminxi104 in #2989
fix internvl2 qk norm by @grimoire in #2987
fix xcomposer2 when transformers is upgraded greater than 4.46 by @irexyc in #3001
Fix get_ppl & get_logits by @lvhan028 in #3008
Fix typo in w4a16 guide by @Yan-Xiangjun in #3018
fix blocked fp8 moe kernel by @grimoire in #3009
Fix async engine by @lzhangzz in #3029
[hotfix] Fix get_ppl by @lvhan028 in #3023
Fix MoE gating for DeepSeek V2 by @lzhangzz in #3030
Fix empty response for pipeline by @lzhangzz in #3034
Fix potential hang during TP model initialization by @lzhangzz in #3033

Full Changelog: 0.6.5...v0.7.0