-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
the difference of quantization implementation between quantize.py and convert_checkpoint.py
#2681
opened Jan 12, 2025 by
XA23i
Error when building the TRT engine on InternVL2 examples
bug
Something isn't working
#2679
opened Jan 10, 2025 by
StMarou
2 of 4 tasks
Inference Qwen2-0.5b + Medusa failed
bug
Something isn't working
#2678
opened Jan 10, 2025 by
shangshng
2 of 4 tasks
Llama-3.2 SmoothQuant convert checkpoint error
bug
Something isn't working
#2677
opened Jan 10, 2025 by
lyffly
1 of 4 tasks
Difference in attention output when compared to HF engine attention output result.
bug
Something isn't working
#2675
opened Jan 9, 2025 by
krishnanpooja
3 of 4 tasks
an error occur(module ‘torch.distributed’ has no attribute ‘ReduceOp’])
#2674
opened Jan 9, 2025 by
fangbaolei
EAGLE model seems to be deployed but raises an error on inference
bug
Something isn't working
#2673
opened Jan 9, 2025 by
nuxlear
2 of 4 tasks
Prompt formatting for different version of InternVL2
bug
Something isn't working
#2672
opened Jan 8, 2025 by
nzarif
2 of 4 tasks
Help needed: No clear documentation/examples for implementing speculative decoding with backend serve
#2671
opened Jan 8, 2025 by
e1ijah1
trtllm-serve without any output Qwne2.5-7b
bug
Something isn't working
OpenAI API
#2667
opened Jan 8, 2025 by
Justin-12138
1 of 4 tasks
fp8 quantization for CohereForCausalLM
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2666
opened Jan 7, 2025 by
Alireza3242
What are supported low-bit (int8/fp8/int4) data types in MLP and Attention layers?
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2664
opened Jan 6, 2025 by
mirzadeh
QTIP Quantization Support?
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2663
opened Jan 6, 2025 by
aikitoria
Segmentation fault crash: Tensorrt-LLM crash when using guided decoding xgrammar and kv cache reuse
bug
Something isn't working
#2660
opened Jan 6, 2025 by
Somasundaram-Palaniappan
2 of 4 tasks
[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example?
Investigating
Performance
Issue about performance number
triaged
Issue has been triaged by maintainers
#2659
opened Jan 5, 2025 by
danielhua23
Qwen2 VL cannot be convert to checkpoint on TensorRT-LLM
bug
Something isn't working
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2658
opened Jan 5, 2025 by
xunuohope1107
2 of 4 tasks
No module named 'tensorrt_llm.bindings'` error message
triaged
Issue has been triaged by maintainers
#2656
opened Jan 3, 2025 by
maulikmadhavi
setuptools conflict
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2655
opened Jan 3, 2025 by
kanebay
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable
bug
Something isn't working
triaged
Issue has been triaged by maintainers
#2652
opened Jan 3, 2025 by
Whisht
2 of 4 tasks
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.