tensorrt-llm：TensorRT加速LLM推理，提升吞吐降低延迟

gamma · 2026-02-19 17:35:21 · 51 次点击 · 0 条评论

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

技能包地址：https://skillsmp.com/skills/davila7-claude-code-templates-cli-tool-components-skills-ai-research-inference-serving-tensorrt-llm-skill-md

51 次点击 ∙ 0 人收藏

登录后收藏

0 条回复