# Image Generation in Ollama (Experimental) Generate images from text prompts using local AI models. ## Quick Start ```bash # Run with a prompt ollama run z-image "a sunset over mountains" Generating: step 30/30 Image saved to: /tmp/ollama-image-1704067200.png ``` On macOS, the generated image will automatically open in Preview. ## Supported Models | Model | VRAM Required | Notes | |-------|---------------|-------| | z-image | ~12GB | Based on Flux architecture | ## CLI Usage ```bash # Generate an image ollama run z-image "a cat playing piano" # Check if model is running ollama ps # Stop the model ollama stop z-image ``` ## API ### OpenAI-Compatible Endpoint ```bash POST /v1/images/generations ``` **Request:** ```json { "model": "z-image", "prompt": "a sunset over mountains", "size": "1024x1024", "response_format": "b64_json" } ``` **Response:** ```json { "created": 1704067200, "data": [ { "b64_json": "iVBORw0KGgo..." } ] } ``` ### Example: cURL ```bash curl http://localhost:11434/v1/images/generations \ -H "Content-Type: application/json" \ -d '{ "model": "z-image", "prompt": "a white cat", "size": "1024x1024" }' ``` ### Example: Save to File ```bash curl -s http://localhost:11434/v1/images/generations \ -H "Content-Type: application/json" \ -d '{ "model": "z-image", "prompt": "a white cat", "size": "1024x1024" }' | jq -r '.data[0].b64_json' | base64 -d > image.png ``` ### Streaming Progress Enable streaming to receive progress updates via SSE: ```bash curl http://localhost:11434/v1/images/generations \ -H "Content-Type: application/json" \ -d '{"model": "z-image", "prompt": "a sunset", "stream": true}' ``` Events: ``` event: progress data: {"step": 1, "total": 30} event: progress data: {"step": 2, "total": 30} ... event: done data: {"created": 1704067200, "data": [{"b64_json": "..."}]} ``` ## Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | model | string | required | Model name | | prompt | string | required | Text description of image | | size | string | "1024x1024" | Image dimensions (WxH) | | n | int | 1 | Number of images (currently only 1 supported) | | response_format | string | "b64_json" | "b64_json" or "url" | | stream | bool | false | Enable progress streaming | ## Requirements - macOS with Apple Silicon (M1/M2/M3/M4) - CUDA: tested on CUDA 12 Blackwell, more testing coming soon - Sufficient VRAM (see model table above) - Ollama built with MLX support ## Limitations - macOS only (uses MLX backend) - Single image per request - Fixed step count (30 steps) - Modelfiles not yet supported (use `ollama create` from model directory) --- # Tensor Model Storage Format Tensor models store each tensor as a separate blob with metadata in the manifest. This enables faster downloads (parallel fetching) and deduplication (shared tensors are stored once). ## Manifest Structure The manifest follows the standard ollama format with tensor-specific layer metadata: ```json { "schemaVersion": 2, "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "config": { "digest": "sha256:...", "size": 1234 }, "layers": [ { "mediaType": "application/vnd.ollama.image.tensor", "digest": "sha256:25b36eed...", "size": 49807448, "name": "text_encoder/model.layers.0.mlp.down_proj.weight", "dtype": "BF16", "shape": [2560, 9728] }, { "mediaType": "application/vnd.ollama.image.json", "digest": "sha256:abc123...", "size": 512, "name": "text_encoder/config.json" } ] } ``` Each tensor layer includes: - `name`: Path-style tensor name (e.g., `text_encoder/model.layers.0.mlp.down_proj.weight`) - `dtype`: Data type (BF16, F32, etc.) - `shape`: Tensor dimensions Config layers use the same path-style naming (e.g., `tokenizer/tokenizer.json`). ## Blob Format Each tensor blob is a minimal safetensors file: ``` [8 bytes: header size (uint64 LE)] [~80 bytes: JSON header, padded to 8-byte alignment] [N bytes: raw tensor data] ``` Header contains a single tensor named `"data"`: ```json {"data":{"dtype":"BF16","shape":[2560,9728],"data_offsets":[0,49807360]}} ``` ## Why Include the Header? The ~88 byte safetensors header enables MLX's native `mlx_load_safetensors` function, which: 1. **Uses mmap** - Maps file directly into memory, no copies 2. **Zero-copy to GPU** - MLX reads directly from mapped pages 3. **No custom code** - Standard MLX API, battle-tested Without the header, we'd need custom C++ code to create MLX arrays from raw mmap'd data. MLX's public API doesn't expose this - it always copies when creating arrays from external pointers. The overhead is negligible: 88 bytes per tensor = ~100KB total for a 13GB model (0.0007%). ## Why Per-Tensor Blobs? **Deduplication**: Blobs are content-addressed by SHA256. If two models share identical tensors (same weights, dtype, shape), they share the same blob file. Example: Model A and Model B both use the same text encoder. The text encoder's 400 tensors are stored once, referenced by both manifests. ``` ~/.ollama/models/ blobs/ sha256-25b36eed... <- shared by both models sha256-abc123... manifests/ library/model-a/latest <- references sha256-25b36eed library/model-b/latest <- references sha256-25b36eed ``` ## Import Flow ``` cd ./weights/Z-Image-Turbo ollama create z-image 1. Scan component directories (text_encoder/, transformer/, vae/) 2. For each .safetensors file: - Extract individual tensors - Wrap each in minimal safetensors format (88B header + data) - Write to blob store (SHA256 content-addressed) - Add layer entry to manifest with path-style name 3. Copy config files (*.json) as config layers 4. Write manifest ``` ## FP8 Quantization Z-Image supports FP8 quantization to reduce memory usage by ~50% while maintaining image quality. ### Usage ```bash cd ./weights/Z-Image-Turbo ollama create z-image-fp8 --quantize fp8 ``` This quantizes weights during import. The resulting model will be ~15GB instead of ~31GB.