ollama

mirror of https://github.com/ollama/ollama.git synced 2026-03-27 02:58:43 +07:00

Files

Patrick Devine d727aacd04 mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )

Add QuantizedEmbedding and EmbeddingLayer interface so models can
use quantized embedding weights and expose tied output projections.
This change updates gemma3, glm4_moe_lite, llama, qwen3, and qwen3_5
to use the new interface.

2026-03-17 11:21:38 -07:00

gemma3

mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )

2026-03-17 11:21:38 -07:00

glm4_moe_lite

mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )

2026-03-17 11:21:38 -07:00

llama

mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )

2026-03-17 11:21:38 -07:00

mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )

2026-03-17 11:21:38 -07:00

qwen3

mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )

2026-03-17 11:21:38 -07:00

qwen3_5

mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )

2026-03-17 11:21:38 -07:00

qwen3_5_moe

MLX: add header vendoring and remove go build tag (#14642 )

2026-03-09 17:24:45 -07:00