Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

rank 0 · 0 points · 1 sources · primary Hacker News Front Page

open source

Summary

jmaczan has open-sourced Tiny-vLLM, a high-performance LLM inference engine built in C++ and CUDA, making it a smaller version of vLLM. The project is available on GitHub, with 141 stars and 7 forks.

Why it matters

The open-sourcing of Tiny-vLLM could lead to increased adoption and development of high-performance LLM inference engines.

Related coverage

Hacker News Front PageShow HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA5/30/2026, 3:31:20 PM

Post Stream

Flat, source-grounded posts. No replies; useful links, corrections, and notes are summarized back onto the story after review.

Local fixture mode allows posting. Production posting requires Google login and write-rate limits.

No posts have been added to this cluster yet.

Rank history