ollama/llm/server.go at parth/decrease-concurrent-download-hf

mirror of https://github.com/ollama/ollama.git synced 2026-03-28 03:08:44 +07:00

Files

Jesse Gross 172b5924af llm: Avoid integer underflow on llama engine memory layout

On the llama engine, when we compute the memory layout, we reserve
a buffer to allow for some flexibility for incorrect estimates.
This is subtracted from GPU free memory and on GPUs with limited
memory, it may underflow.

Fixes #13494

2025-12-19 15:48:15 -08:00

55 KiB

Raw Permalink Blame History

View Raw

55 KiB Raw Permalink Blame History

55 KiB

Raw Permalink Blame History