[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example? #2659

danielhua23 · 2025-01-05T13:18:55Z

Dear TRT-LLM team,

lets consider sm80 and f16s8, the cutlass example of f16s8 TN mixed gemm shown here is different from TRT-LLM implementation, specifically, to my knowledge, the TRT-LLM one added the dequantization scale, but the cutlass one did not. Then my questions are:

Is the performance or accuracy of TRT-LLM adding dequantization scale better than cutlass native one in LLM linear cases?
from here, I see the TRT-LLM one seems load operand B(s8) using LDS not LDSM, but I can't find the f16s8 LDS specialization in MmaTensorOpMultiplicandTileIterator, only find LDS specialization for TF32, which make me confused with the “LDS". Am I missing something?

Thanks your time!

danielhua23 · 2025-01-05T13:20:34Z

cc @kaiyux @juney-nvidia @Shixiaowei02

nv-guomingz · 2025-01-06T03:00:18Z

@Barry-Delaney could u please comment this question?

nv-guomingz added the Performance Issue about performance number label Jan 6, 2025

nv-guomingz assigned Barry-Delaney Jan 6, 2025

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example? #2659

[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example? #2659

danielhua23 commented Jan 5, 2025

danielhua23 commented Jan 5, 2025

nv-guomingz commented Jan 6, 2025

[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example? #2659

[QST] why the implementation of f16xs8 mixed gemm is different between TRT-LLM and native cutlass mixed gemm example? #2659

Comments

danielhua23 commented Jan 5, 2025

danielhua23 commented Jan 5, 2025

nv-guomingz commented Jan 6, 2025