runner: fix swallowed error in allocModel graph reservation

In allocModel(), the first call to reserveWorstCaseGraph(true) had its
error silently discarded — `return nil` was used instead of `return err`.

This meant that if the prompt-sized graph reservation failed (e.g. due
to insufficient memory), the error was swallowed, allocModel reported
success, and the model appeared to load correctly. Subsequent inference
would then fail in unexpected ways because the worst-case graph was
never properly reserved.

Fix: return the actual error so the caller can handle the failure
(retry with reduced parallelism, report OOM, etc.).

Co-Authored-By: Claude (claude-opus-4-6) <noreply@anthropic.com>
This commit is contained in:
easonysliu
2026-03-14 10:35:40 +08:00
committed by Jesse Gross
parent 856c047a6c
commit 810d4f9c22

View File

@@ -1231,7 +1231,7 @@ func (s *Server) allocModel(
err = s.reserveWorstCaseGraph(true)
if err != nil {
return nil
return err
}
return s.reserveWorstCaseGraph(false)