mirror of https://github.com/ollama/ollama.git synced 2026-03-28 03:08:44 +07:00

Files

Jeffrey Morgan 9667c2282f x/imagegen: add naive TeaCache and FP8 quantization support (#13683 )

TeaCache:
- Timestep embedding similarity caching for diffusion models
- Polynomial rescaling with configurable thresholds
- Reduces transformer forward passes by ~30-50%

FP8 quantization:
- Support for FP8 quantized models (8-bit weights with scales)
- QuantizedMatmul on Metal, Dequantize on CUDA
- Client-side quantization via ollama create --quantize fp8

Other bug fixes:
- Fix `/api/show` API for image generation models
- Server properly returns model info (architecture, parameters, quantization)
- Memory allocation optimizations
- CLI improvements for image generation

2026-01-12 13:45:22 -08:00

generate.go

Add experimental MLX backend and engine with imagegen support (#13648 )

2026-01-08 16:18:59 -08:00

image.go

Add experimental MLX backend and engine with imagegen support (#13648 )

2026-01-08 16:18:59 -08:00

main.go

x/imagegen: add naive TeaCache and FP8 quantization support (#13683 )

2026-01-12 13:45:22 -08:00

README.md

Add z-image image generation prototype (#13659 )

2026-01-09 21:09:46 -08:00

sample.go

Add experimental MLX backend and engine with imagegen support (#13648 )

2026-01-08 16:18:59 -08:00

README.md

MLX Engine

Experimental MLX backend for running models on Apple Silicon and CUDA.

Build

go build -tags mlx -o engine ./x/imagegen/cmd/engine

Text Generation

./engine -model /path/to/model -prompt "Hello" -max-tokens 100

Options:

-temperature - sampling temperature (default 0.7)
-top-p - nucleus sampling (default 0.9)
-top-k - top-k sampling (default 40)

Supports: Llama, Gemma3, GPT-OSS

Image Generation

./engine -zimage -model /path/to/z-image -prompt "a cat" -output cat.png

Options:

-width, -height - image dimensions (default 1024x1024)
-steps - denoising steps (default 9)
-seed - random seed (default 42)