Cuda

Your weight: normal

0.

When does fragmentation occur in the CUDA caching allocator? (docs.pytorch.org)

0 points 1 sources 1 minutes ago cluster

CUDA memory fragmentation occurs when certain allocation patterns prevent the allocator from serving requests, even with available free space, due to its internal implementation.

cuda memory-fragmentation pytorch
0.

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA (github.com)

0 points 1 sources 1 minutes ago cluster

jmaczan has open-sourced Tiny-vLLM, a high-performance LLM inference engine built in C++ and CUDA, making it a smaller version of vLLM. The project is available on GitHub, with 141 stars and 7 forks.

ai c cuda llm machine-learning