Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

rank 0 · 0 points · 1 sources · primary Hacker News Front Page

Summary

jmaczan has open-sourced Tiny-vLLM, a high-performance LLM inference engine built in C++ and CUDA, making it a smaller version of vLLM. The project is available on GitHub, with 141 stars and 7 forks.

Why it matters

The open-sourcing of Tiny-vLLM could lead to increased adoption and development of high-performance LLM inference engines.

Topics

ai c cuda llm machine-learning

Related coverage

Hacker News Front Page

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

5/30/2026, 3:31:20 PM

Post Stream

Flat, source-grounded posts. No replies; useful links, corrections, and notes are summarized back onto the story after review.

No posts have been added to this cluster yet.

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

Summary

Why it matters

Topics

Related coverage

Post Stream

Rank history