gpu: mention Docker Model Runner (#23810)

<!--Delete sections as needed -->

## Description

Mention Docker Model Runner to test the GPU on Windows with NVIDIA
platforms.

<img width="952" height="603" alt="Screenshot 2025-12-04 at 17 33 12"
src="https://github.com/user-attachments/assets/92d68d8d-097e-4599-9e73-12121f495359"
/>


## Reviews

<!-- Notes for reviewers here -->
<!-- List applicable reviews (optionally @tag reviewers) -->

- [ ] Technical review
- [ ] Editorial review
- [ ] Product review

---------

Signed-off-by: Dorin Geman <dorin.geman@docker.com>
Co-authored-by: Allie Sadler <102604716+aevesdocker@users.noreply.github.com>
This commit is contained in:
Dorin-Andrei Geman
2025-12-05 12:13:17 +02:00
committed by GitHub
parent a4b77bfdc7
commit 9963fecdaf

View File

@@ -63,16 +63,32 @@ GPU Device 0: "GeForce RTX 2060 with Max-Q Design" with compute capability 7.5
= 2724.379 single-precision GFLOP/s at 20 flops per interaction
```
## Run a real-world model: Llama2 with Ollama
## Run a real-world model: SmolLM2 with Docker Model Runner
Use the [official Ollama image](https://hub.docker.com/r/ollama/ollama) to run the Llama2 LLM with GPU acceleration:
> [!NOTE]
>
> Docker Model Runner with vLLM for Windows with WSL2 is available starting with Docker Desktop 4.54.
Use Docker Model Runner to run the SmolLM2 LLM with vLLM and GPU acceleration:
```console
$ docker run --gpus=all -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
$ docker model install-runner --backend vllm --gpu cuda
```
Then start the model:
Check it's correctly installed:
```console
$ docker exec -it ollama ollama run llama2
$ docker status
Docker Model Runner is running
Status:
llama.cpp: running llama.cpp version: c22473b
vllm: running vllm version: 0.11.0
```
Run the model:
```console
$ docker model un ai/smollm2-vllm hi
Hello! I'm sure everything goes smoothly here. How can I assist you today?
```