[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example? #2659
Labels
Investigating
Performance
Issue about performance number
triaged
Issue has been triaged by maintainers
Dear TRT-LLM team,
lets consider sm80 and f16s8, the cutlass example of f16s8 TN mixed gemm shown here is different from TRT-LLM implementation, specifically, to my knowledge, the TRT-LLM one added the dequantization scale, but the cutlass one did not. Then my questions are:
Thanks your time!
The text was updated successfully, but these errors were encountered: