LMDeploy Release V0.2.6
Highlight
Support vision-languange models (VLM) inference pipeline and serving.
Currently, it supports the following models, Qwen-VL-Chat, LLaVA series v1.5, v1.6 and Yi-VL
- VLM Inference Pipeline
from lmdeploy import pipeline
from lmdeploy.vl import load_image
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b')
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response)
Please refer to the detailed guide from here
- VLM serving by openai compatible server
lmdeploy server api_server liuhaotian/llava-v1.6-vicuna-7b --server-port 8000
- VLM Serving by gradio
lmdeploy serve gradio liuhaotian/llava-v1.6-vicuna-7b --server-port 6006
What's Changed
🚀 Features
- Add inference pipeline for VL models by @irexyc in #1214
- Support serving VLMs by @AllentDan in #1285
- Serve VLM by gradio by @irexyc in #1293
- Add pipeline.chat api for easy use by @irexyc in #1292
💥 Improvements
- Hide qos functions from swagger UI if not applied by @AllentDan in #1238
- Color log formatter by @grimoire in #1247
- optimize filling kv cache kernel in pytorch engine by @grimoire in #1251
- Refactor chat template and support accurate name matching. by @AllentDan in #1216
- Support passing json file to chat template by @AllentDan in #1200
- upgrade peft and check adapters by @grimoire in #1284
- better cache allocation in pytorch engine by @grimoire in #1272
- Fall back to base template if there is no chat_template in tokenizer_config.json by @AllentDan in #1294
🐞 Bug fixes
- lazy load convert_pv jit function by @grimoire in #1253
- [BUG] fix the case when num_used_blocks < 0 by @jjjjohnson in #1277
- Check bf16 model in torch engine by @grimoire in #1270
- fix bf16 check by @grimoire in #1281
- [Fix] fix triton server chatbot init error by @AllentDan in #1278
- Fix concatenate issue in profile serving by @ispobock in #1282
- fix torch tp lora adapter by @grimoire in #1300
- Fix crash when api_server loads a turbomind model by @irexyc in #1304
📚 Documentations
- fix config for readthedocs by @RunningLeon in #1245
- update badges in README by @lvhan028 in #1243
- Update serving guide including api_server and gradio by @lvhan028 in #1248
- rename restful_api.md to api_server.md by @lvhan028 in #1287
- Update readthedocs index by @lvhan028 in #1288
🌐 Other
- Parallelize testcase and refactor test workflow by @zhulinJulia24 in #1254
- Accelerate sample request in benchmark script by @ispobock in #1264
- Update eval ci cfg by @RunningLeon in #1259
- Test case bugfix and add restful interface testcases. by @zhulinJulia24 in #1271
- bump version to v0.2.6 by @lvhan028 in #1299
New Contributors
- @jjjjohnson made their first contribution in #1277
Full Changelog: v0.2.5...v0.2.6