Files
lobehub/locales/en-US/models.json

1325 lines
282 KiB
JSON
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"01-ai/yi-1.5-34b-chat.description": "01.AIs latest open-source fine-tuned model with 34B parameters, supporting multiple dialogue scenarios, trained on high-quality data and aligned with human preferences.",
"01-ai/yi-1.5-9b-chat.description": "01.AIs latest open-source fine-tuned model with 9B parameters, supporting multiple dialogue scenarios, trained on high-quality data and aligned with human preferences.",
"360/deepseek-r1.description": "360-deployed DeepSeek-R1 uses large-scale RL in post-training to greatly boost reasoning with minimal labels. It matches OpenAI o1 on math, code, and natural language reasoning tasks.",
"360gpt-pro-trans.description": "A translation-specialized model, deeply fine-tuned for leading translation quality.",
"360gpt-pro.description": "360GPT Pro is a key 360 AI model with efficient text processing for diverse NLP scenarios, supporting long-text understanding and multi-turn dialogue.",
"360gpt-turbo-responsibility-8k.description": "360GPT Turbo Responsibility 8K emphasizes semantic safety and responsibility for content-sensitive applications, ensuring accurate and robust user experiences.",
"360gpt-turbo.description": "360GPT Turbo delivers strong compute and chat capability with excellent semantic understanding and generation efficiency, ideal for enterprise and developers.",
"360gpt2-o1.description": "360gpt2-o1 builds chain-of-thought via tree search with a reflection mechanism and RL training, enabling self-reflection and self-correction.",
"360gpt2-pro.description": "360GPT2 Pro is an advanced NLP model from 360 with excellent text generation and understanding, especially for creative tasks, handling complex transformations and roleplay.",
"360zhinao2-o1.5.description": "360 Zhinao most powerful reasoning model, featuring the strongest capabilities and supporting both tool calling and advanced reasoning.",
"360zhinao2-o1.description": "360zhinao2-o1 builds chain-of-thought via tree search with a reflection mechanism and RL training, enabling self-reflection and self-correction.",
"360zhinao3-o1.5.description": "360 Zhinao Next-Generation Reasoning Model.",
"4.0Ultra.description": "Spark Ultra is the most powerful model in the Spark series, improving text understanding and summarization while upgrading web search. It is a comprehensive solution for boosting workplace productivity and accurate responses, positioning it as a leading intelligent product.",
"AnimeSharp.description": "AnimeSharp (aka \"4x-AnimeSharp\") is an open-source super-resolution model based on ESRGAN by Kim2091, focused on upscaling and sharpening anime-style images. It was renamed from \"4x-TextSharpV1\" in February 2022, originally also for text images but heavily optimized for anime content.",
"Baichuan2-Turbo.description": "Uses search augmentation to connect the model with domain and web knowledge. Supports PDF/Word uploads and URL inputs for timely, comprehensive retrieval and professional, accurate outputs.",
"Baichuan3-Turbo-128k.description": "With a 128K ultra-long context window, it is optimized for high-frequency enterprise scenarios with major gains and strong value. Compared to Baichuan2, content creation improves by 20%, knowledge QA by 17%, and roleplay by 40%. Overall performance is better than GPT-3.5.",
"Baichuan3-Turbo.description": "Optimized for high-frequency enterprise scenarios with major gains and strong value. Compared to Baichuan2, content creation improves by 20%, knowledge QA by 17%, and roleplay by 40%. Overall performance is better than GPT-3.5.",
"Baichuan4-Air.description": "A top-performing model in China, surpassing major overseas models on Chinese tasks like knowledge, long-form text, and creative generation. It also features industry-leading multimodal capabilities with strong results on authoritative benchmarks.",
"Baichuan4-Turbo.description": "A top-performing model in China, surpassing major overseas models on Chinese tasks like knowledge, long-form text, and creative generation. It also features industry-leading multimodal capabilities with strong results on authoritative benchmarks.",
"Baichuan4.description": "Top domestic performance, surpassing leading overseas models on Chinese tasks like encyclopedic knowledge, long text, and creative generation. Also offers industry-leading multimodal capabilities and strong benchmark results.",
"ByteDance-Seed/Seed-OSS-36B-Instruct.description": "Seed-OSS is a family of open-source LLMs from ByteDance Seed, designed for strong long-context handling, reasoning, agent, and general abilities. Seed-OSS-36B-Instruct is a 36B instruction-tuned model with native ultra-long context for processing large documents or codebases. It is optimized for reasoning, code generation, and agent tasks (tool use) while retaining strong general ability. A key feature is \"Thinking Budget,\" allowing flexible reasoning length to improve efficiency.",
"DeepSeek-R1-Distill-Llama-70B.description": "DeepSeek R1, the larger and smarter model in the DeepSeek suite, is distilled into the Llama 70B architecture. Benchmarks and human evals show it is smarter than the base Llama 70B, especially on math and fact-precision tasks.",
"DeepSeek-R1-Distill-Qwen-1.5B.description": "A DeepSeek-R1 distilled model based on Qwen2.5-Math-1.5B. Reinforcement learning and cold-start data optimize reasoning performance, setting new multi-task benchmarks for open models.",
"DeepSeek-R1-Distill-Qwen-14B.description": "DeepSeek-R1-Distill models are fine-tuned from open-source models using sample data generated by DeepSeek-R1.",
"DeepSeek-R1-Distill-Qwen-32B.description": "DeepSeek-R1-Distill models are fine-tuned from open-source models using sample data generated by DeepSeek-R1.",
"DeepSeek-R1-Distill-Qwen-7B.description": "A DeepSeek-R1 distilled model based on Qwen2.5-Math-7B. Reinforcement learning and cold-start data optimize reasoning performance, setting new multi-task benchmarks for open models.",
"DeepSeek-R1.description": "DeepSeek-R1 applies large-scale reinforcement learning during post-training, greatly boosting reasoning with very little labeled data. It matches the OpenAI o1 production model on math, code, and natural language reasoning tasks.",
"DeepSeek-V3-1.description": "DeepSeek V3.1 is a next-gen reasoning model with improved complex reasoning and chain-of-thought, suited for deep analysis tasks.",
"DeepSeek-V3-Fast.description": "Provider: sophnet. DeepSeek V3 Fast is the high-TPS version of DeepSeek V3 0324, full-precision (non-quantized) with stronger code and math and faster responses.",
"DeepSeek-V3.1-Fast.description": "DeepSeek V3.1 Fast is the high-TPS fast variant of DeepSeek V3.1. Hybrid thinking mode: via chat templates, one model supports both thinking and non-thinking. Smarter tool use: post-training boosts tool and agent task performance.",
"DeepSeek-V3.1-Think.description": "DeepSeek-V3.1 thinking mode: a new hybrid reasoning model with thinking and non-thinking modes, more efficient than DeepSeek-R1-0528. Post-training optimizations significantly improve agent tool use and agent task performance.",
"DeepSeek-V3.description": "DeepSeek-V3 is a MoE model developed by DeepSeek. It surpasses other open models like Qwen2.5-72B and Llama-3.1-405B on many benchmarks and is competitive with leading closed models such as GPT-4o and Claude 3.5 Sonnet.",
"Doubao-lite-128k.description": "Doubao-lite offers ultra-fast responses and better value, with flexible options across scenarios. Supports 128K context for inference and fine-tuning.",
"Doubao-lite-32k.description": "Doubao-lite offers ultra-fast responses and better value, with flexible options across scenarios. Supports 32K context for inference and fine-tuning.",
"Doubao-lite-4k.description": "Doubao-lite offers ultra-fast responses and better value, with flexible options across scenarios. Supports 4K context for inference and fine-tuning.",
"Doubao-pro-128k.description": "Best-performing flagship model for complex tasks, strong in reference QA, summarization, creation, classification, and roleplay. Supports 128K context for inference and fine-tuning.",
"Doubao-pro-32k.description": "Best-performing flagship model for complex tasks, strong in reference QA, summarization, creation, classification, and roleplay. Supports 32K context for inference and fine-tuning.",
"Doubao-pro-4k.description": "Best-performing flagship model for complex tasks, strong in reference QA, summarization, creation, classification, and roleplay. Supports 4K context for inference and fine-tuning.",
"DreamO.description": "DreamO is an open-source image customization model jointly developed by ByteDance and Peking University, using a unified architecture to support multi-task image generation. It employs efficient compositional modeling to generate highly consistent, customized images based on user-specified identity, subject, style, background, and other conditions.",
"ERNIE-3.5-128K.description": "Baidus flagship large-scale LLM trained on massive Chinese/English corpora with strong general ability for chat, creation, and plugin use; supports automatic Baidu Search plugin integration for fresh answers.",
"ERNIE-3.5-8K-Preview.description": "Baidus flagship large-scale LLM trained on massive Chinese/English corpora with strong general ability for chat, creation, and plugin use; supports automatic Baidu Search plugin integration for fresh answers.",
"ERNIE-3.5-8K.description": "Baidus flagship large-scale LLM trained on massive Chinese/English corpora with strong general ability for chat, creation, and plugin use; supports automatic Baidu Search plugin integration for fresh answers.",
"ERNIE-4.0-8K-Latest.description": "Baidus flagship ultra-large LLM with comprehensive upgrades over ERNIE 3.5, suitable for complex tasks across domains; supports Baidu Search plugin integration for fresh answers.",
"ERNIE-4.0-8K-Preview.description": "Baidus flagship ultra-large LLM with comprehensive upgrades over ERNIE 3.5, suitable for complex tasks across domains; supports Baidu Search plugin integration for fresh answers.",
"ERNIE-4.0-Turbo-8K-Latest.description": "Baidus flagship ultra-large LLM with strong overall performance for complex tasks, with Baidu Search plugin integration for fresh answers. It outperforms ERNIE 4.0.",
"ERNIE-4.0-Turbo-8K-Preview.description": "Baidus flagship ultra-large LLM with strong overall performance for complex tasks, with Baidu Search plugin integration for fresh answers. It outperforms ERNIE 4.0.",
"ERNIE-Character-8K.description": "Baidus vertical-domain LLM for game NPCs, customer service, and roleplay, with clearer persona consistency, stronger instruction following, and better reasoning.",
"ERNIE-Lite-Pro-128K.description": "Baidus lightweight LLM balancing quality and inference performance, better than ERNIE Lite and suitable for low-compute accelerators.",
"ERNIE-Speed-128K.description": "Baidus latest high-performance LLM (2024) with strong general ability, suitable as a base for fine-tuning to handle specific scenarios, with excellent reasoning performance.",
"ERNIE-Speed-Pro-128K.description": "Baidus latest high-performance LLM (2024) with strong general ability, better than ERNIE Speed, suitable as a base for fine-tuning with excellent reasoning performance.",
"FLUX-1.1-pro.description": "FLUX.1.1 Pro",
"FLUX.1-Kontext-dev.description": "FLUX.1-Kontext-dev is a multimodal image generation and editing model from Black Forest Labs based on a Rectified Flow Transformer architecture with 12B parameters. It focuses on generating, reconstructing, enhancing, or editing images under given context conditions. It combines the controllable generation strengths of diffusion models with Transformer context modeling, supporting high-quality outputs for tasks like inpainting, outpainting, and visual scene reconstruction.",
"FLUX.1-Kontext-pro.description": "FLUX.1 Kontext [pro]",
"FLUX.1-dev.description": "FLUX.1-dev is an open-source multimodal language model (MLLM) from Black Forest Labs, optimized for image-text tasks and combining image/text understanding and generation. Built on advanced LLMs (such as Mistral-7B), it uses a carefully designed vision encoder and multi-stage instruction tuning to enable multimodal coordination and complex task reasoning.",
"Gryphe/MythoMax-L2-13b.description": "MythoMax-L2 (13B) is an innovative model for diverse domains and complex tasks.",
"HelloMeme.description": "HelloMeme is an AI tool that generates memes, GIFs, or short videos from the images or motions you provide. It requires no drawing or coding skills—just a reference image—to produce fun, attractive, and stylistically consistent content.",
"HiDream-E1-Full.description": "HiDream-E1-Full is an open-source multimodal image editing model from HiDream.ai, based on an advanced Diffusion Transformer architecture and strong language understanding (built-in LLaMA 3.1-8B-Instruct). It supports natural-language-driven image generation, style transfer, local edits, and repainting, with excellent image-text understanding and execution.",
"HiDream-I1-Full.description": "HiDream-I1 is a new open-source base image generation model released by HiDream. With 17B parameters (Flux has 12B), it can deliver industry-leading image quality in seconds.",
"HunyuanDiT-v1.2-Diffusers-Distilled.description": "hunyuandit-v1.2-distilled is a lightweight text-to-image model optimized via distillation to generate high-quality images quickly, especially suited for low-resource environments and real-time generation.",
"InstantCharacter.description": "InstantCharacter is a tuning-free personalized character generation model released by Tencent AI in 2025, aiming for high-fidelity, cross-scenario consistent character generation. It can model a character from a single reference image and flexibly transfer it across styles, actions, and backgrounds.",
"InternVL2-8B.description": "InternVL2-8B is a powerful vision-language model supporting multimodal image-text processing, accurately recognizing image content and generating relevant descriptions or answers.",
"InternVL2.5-26B.description": "InternVL2.5-26B is a powerful vision-language model supporting multimodal image-text processing, accurately recognizing image content and generating relevant descriptions or answers.",
"Kolors.description": "Kolors is a text-to-image model developed by the Kuaishou Kolors team. Trained with billions of parameters, it has notable advantages in visual quality, Chinese semantic understanding, and text rendering.",
"Kwai-Kolors/Kolors.description": "Kolors is a large-scale latent-diffusion text-to-image model by the Kuaishou Kolors team. Trained on billions of text-image pairs, it excels in visual quality, complex semantic accuracy, and Chinese/English text rendering, with strong Chinese content understanding and generation.",
"Kwaipilot/KAT-Dev.description": "KAT-Dev (32B) is an open-source 32B model for software engineering tasks. It achieves a 62.4% solve rate on SWE-Bench Verified, ranking 5th among open models. It is optimized through mid-training, SFT, and RL for code completion, bug fixing, and code review.",
"Llama-3.2-11B-Vision-Instruct.description": "Strong image reasoning on high-resolution images, suited for visual understanding applications.",
"Llama-3.2-90B-Vision-Instruct\t.description": "Advanced image reasoning for visual-understanding agent applications.",
"LongCat-Flash-Chat.description": "The LongCat-Flash-Chat model has been upgraded to a new version. This update involves enhancements to model capabilities only; the model name and API invocation method remain unchanged. Building upon its hallmark “extreme efficiency” and “lightning-fast response,” the new version further strengthens contextual understanding and real-world programming performance: Significantly Enhanced Coding Capabilities: Deeply optimized for developer-centric scenarios, the model delivers substantial improvements in code generation, debugging, and explanation tasks. Developers are strongly encouraged to evaluate and benchmark these enhancements. Support for 256K Ultra-Long Context: The context window has doubled from the previous generation (128K) to 256K, enabling efficient processing of massive documents and long-sequence tasks. Comprehensively Improved Multilingual Performance: Provides strong support for nine languages, including Spanish, French, Arabic, Portuguese, Russian, and Indonesian. More Powerful Agent Capabilities: Demonstrates greater robustness and efficiency in complex tool invocation and multi-step task execution.",
"LongCat-Flash-Lite.description": "The LongCat-Flash-Lite model has been officially released. It adopts an efficient Mixture-of-Experts (MoE) architecture, with 68.5 billion total parameters and approximately 3 billion activated parameters. Through the use of an N-gram embedding table, it achieves highly efficient parameter utilization, and it is deeply optimized for inference efficiency and specific application scenarios. Compared to models of a similar scale, its core features are as follows:Outstanding Inference Efficiency: By leveraging the N-gram embedding table to fundamentally alleviate the I/O bottleneck inherent in MoE architectures, combined with dedicated caching mechanisms and kernel-level optimizations, it significantly reduces inference latency and improves overall efficiency. Strong Agent and Code Performance: It demonstrates highly competitive capabilities in tool invocation and software development tasks, delivering exceptional performance relative to its model size.",
"LongCat-Flash-Thinking-2601.description": "The LongCat-Flash-Thinking-2601 model has been officially released. As an upgraded reasoning model built on a Mixture-of-Experts (MoE) architecture, it features a total of 560 billion parameters. While maintaining strong competitiveness across traditional reasoning benchmarks, it systematically enhances Agent-level reasoning capabilities through large-scale multi-environment reinforcement learning. Compared to the LongCat-Flash-Thinking model, the key upgrades are as follows: Extreme Robustness in Noisy Environments: Through systematic curriculum-style training targeting noise and uncertainty in real-world settings, the model demonstrates outstanding performance in Agent tool invocation, Agent-based search, and tool-integrated reasoning, with significantly improved generalization. Powerful Agent Capabilities: By constructing a tightly coupled dependency graph encompassing more than 60 tools, and scaling training through multi-environment expansion and large-scale exploratory learning, the model markedly improves its ability to generalize to complex and out-of-distribution real-world scenarios. Advanced Deep Thinking Mode: It expands the breadth of reasoning via parallel inference and deepens analytical capability through recursive feedback-driven summarization and abstraction mechanisms, effectively addressing highly challenging problems.",
"LongCat-Flash-Thinking.description": "LongCat-Flash-Thinking has been officially released and open-sourced simultaneously. It is a deep reasoning model that can be used for free conversations within LongCat Chat, or accessed via API by specifying model=LongCat-Flash-Thinking.",
"Meta-Llama-3-3-70B-Instruct.description": "Llama 3.3 70B is a versatile Transformer model for chat and generation tasks.",
"Meta-Llama-3.1-405B-Instruct.description": "Llama 3.1 instruction-tuned text model optimized for multilingual chat, performing strongly on common industry benchmarks among open and closed chat models.",
"Meta-Llama-3.1-70B-Instruct.description": "Llama 3.1 instruction-tuned text model optimized for multilingual chat, performing strongly on common industry benchmarks among open and closed chat models.",
"Meta-Llama-3.1-8B-Instruct.description": "Llama 3.1 instruction-tuned text model optimized for multilingual chat, performing strongly on common industry benchmarks among open and closed chat models.",
"Meta-Llama-3.2-1B-Instruct.description": "Cutting-edge small language model with strong language understanding, excellent reasoning, and text generation.",
"Meta-Llama-3.2-3B-Instruct.description": "Cutting-edge small language model with strong language understanding, excellent reasoning, and text generation.",
"Meta-Llama-3.3-70B-Instruct.description": "Llama 3.3 is the most advanced multilingual open-source Llama model, delivering near-405B performance at very low cost. It is Transformer-based and improved with SFT and RLHF for usefulness and safety. The instruction-tuned version is optimized for multilingual chat and beats many open and closed chat models on industry benchmarks. Knowledge cutoff: Dec 2023.",
"Meta-Llama-4-Maverick-17B-128E-Instruct-FP8.description": "Llama 4 Maverick is a large MoE model with efficient expert activation for strong reasoning performance.",
"MiniMax-M1.description": "A new in-house reasoning model with 80K chain-of-thought and 1M input, delivering performance comparable to top global models.",
"MiniMax-M2-Stable.description": "Built for efficient coding and agent workflows, with higher concurrency for commercial use.",
"MiniMax-M2.1-Lightning.description": "Powerful multilingual programming capabilities, comprehensively upgraded programming experience. Faster and more efficient.",
"MiniMax-M2.1-highspeed.description": "Powerful multilingual programming capabilities with faster and more efficient inference.",
"MiniMax-M2.1.description": "MiniMax-M2.1 is a flagship open-source large model from MiniMax, focusing on solving complex real-world tasks. Its core strengths are multi-language programming capabilities and the ability to solve complex tasks as an Agent.",
"MiniMax-M2.5-Lightning.description": "M2.5 Lightning: Same performance, faster and more agile (approx. 100 tps).",
"MiniMax-M2.5-highspeed.description": "Same performance as M2.5 with significantly faster inference.",
"MiniMax-M2.5.description": "MiniMax-M2.5 is a flagship open-source large model from MiniMax, focusing on solving complex real-world tasks. Its core strengths are multi-language programming capabilities and the ability to solve complex tasks as an Agent.",
"MiniMax-M2.7-highspeed.description": "Same performance as M2.7 with significantly faster inference (~100 tps).",
"MiniMax-M2.7.description": "First self-evolving model with top-tier coding and agentic performance (~60 tps).",
"MiniMax-M2.description": "Built specifically for efficient coding and Agent workflows",
"MiniMax-Text-01.description": "MiniMax-01 introduces large-scale linear attention beyond classic Transformers, with 456B parameters and 45.9B activated per pass. It achieves top-tier performance and supports up to 4M tokens of context (32× GPT-4o, 20× Claude-3.5-Sonnet).",
"MiniMaxAI/MiniMax-M1-80k.description": "MiniMax-M1 is an open-weights large-scale hybrid-attention reasoning model with 456B total parameters and ~45.9B active per token. It natively supports 1M context and uses Flash Attention to cut FLOPs by 75% on 100K-token generation vs DeepSeek R1. With an MoE architecture plus CISPO and hybrid-attention RL training, it achieves leading performance on long-input reasoning and real software engineering tasks.",
"MiniMaxAI/MiniMax-M2.description": "MiniMax-M2 redefines agent efficiency. It is a compact, fast, cost-effective MoE model with 230B total and 10B active parameters, built for top-tier coding and agent tasks while retaining strong general intelligence. With only 10B active parameters, it rivals much larger models, making it ideal for high-efficiency applications.",
"Moonshot-Kimi-K2-Instruct.description": "1T total parameters with 32B active. Among non-thinking models, it is top-tier in frontier knowledge, math, and coding, and stronger at general agent tasks. Optimized for agent workloads, it can take actions, not just answer questions. Best for improvisational, general chat, and agent experiences as a reflex-level model without long thinking.",
"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO.description": "Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B) is a high-precision instruction model for complex computation.",
"OmniConsistency.description": "OmniConsistency improves style consistency and generalization in image-to-image tasks by introducing large-scale Diffusion Transformers (DiTs) and paired stylized data, avoiding style degradation.",
"PaddlePaddle/PaddleOCR-VL-1.5.description": "PaddleOCR-VL-1.5 is an upgraded version of the PaddleOCR-VL series, achieving 94.5% accuracy on the OmniDocBench v1.5 document parsing benchmark, surpassing leading general large models and specialized document parsing models. It innovatively supports irregular bounding box localization for document elements, handling scanned, tilted, and screen-captured images effectively.",
"Phi-3-medium-128k-instruct.description": "The same Phi-3-medium model with a larger context window for RAG or few-shot prompts.",
"Phi-3-medium-4k-instruct.description": "A 14B-parameter model with higher quality than Phi-3-mini, focused on high-quality, reasoning-intensive data.",
"Phi-3-mini-128k-instruct.description": "The same Phi-3-mini model with a larger context window for RAG or few-shot prompts.",
"Phi-3-mini-4k-instruct.description": "The smallest Phi-3 family member, optimized for quality and low latency.",
"Phi-3-small-128k-instruct.description": "The same Phi-3-small model with a larger context window for RAG or few-shot prompts.",
"Phi-3-small-8k-instruct.description": "A 7B-parameter model with higher quality than Phi-3-mini, focused on high-quality, reasoning-intensive data.",
"Phi-3.5-mini-instruct.description": "An updated version of the Phi-3-mini model.",
"Phi-3.5-vision-instrust.description": "An updated version of the Phi-3-vision model.",
"Pro/MiniMaxAI/MiniMax-M2.1.description": "MiniMax-M2.1 is an open-source large language model optimized for agent capabilities, excelling in programming, tool usage, instruction following, and long-term planning. The model supports multilingual software development and complex multi-step workflow execution, achieving a score of 74.0 on SWE-bench Verified and surpassing Claude Sonnet 4.5 in multilingual scenarios.",
"Pro/MiniMaxAI/MiniMax-M2.5.description": "MiniMax-M2.5 is the latest large language model developed by MiniMax, trained through large-scale reinforcement learning across hundreds of thousands of complex, real-world environments. Featuring an MoE architecture with 229 billion parameters, it achieves industry-leading performance in tasks such as programming, agent tool-calling, search, and office scenarios.",
"Pro/Qwen/Qwen2-7B-Instruct.description": "Qwen2-7B-Instruct is a 7B instruction-tuned LLM in the Qwen2 series. It uses Transformer architecture with SwiGLU, attention QKV bias, and grouped-query attention, and handles large inputs. It performs strongly across language understanding, generation, multilingual tasks, coding, math, and reasoning, outperforming most open models and competing with proprietary ones. It surpasses Qwen1.5-7B-Chat on multiple benchmarks.",
"Pro/Qwen/Qwen2.5-7B-Instruct.description": "Qwen2.5-7B-Instruct is part of Alibaba Clouds latest LLM series. The 7B model brings notable gains in coding and math, supports 29+ languages, and improves instruction following, structured data understanding, and structured output (especially JSON).",
"Pro/Qwen/Qwen2.5-Coder-7B-Instruct.description": "Qwen2.5-Coder-7B-Instruct is the latest Alibaba Cloud code-focused LLM. Built on Qwen2.5 and trained on 5.5T tokens, it significantly improves code generation, reasoning, and repair while retaining math and general strengths, providing a solid base for coding agents.",
"Pro/Qwen/Qwen2.5-VL-7B-Instruct.description": "Qwen2.5-VL is a new Qwen vision-language model with strong visual understanding. It analyzes text, charts, and layouts in images, understands long videos and events, supports reasoning and tool use, multi-format object grounding, and structured outputs. It improves dynamic resolution and frame-rate training for video understanding and boosts vision encoder efficiency.",
"Pro/THUDM/GLM-4.1V-9B-Thinking.description": "GLM-4.1V-9B-Thinking is an open-source VLM from Zhipu AI and Tsinghua KEG Lab, designed for complex multimodal cognition. Built on GLM-4-9B-0414, it adds chain-of-thought reasoning and RL to significantly improve cross-modal reasoning and stability.",
"Pro/THUDM/glm-4-9b-chat.description": "GLM-4-9B-Chat is the open-source GLM-4 model from Zhipu AI. It performs strongly across semantics, math, reasoning, code, and knowledge. Beyond multi-turn chat, it supports web browsing, code execution, custom tool calls, and long-text reasoning. It supports 26 languages (including Chinese, English, Japanese, Korean, German). It performs well on AlignBench-v2, MT-Bench, MMLU, and C-Eval, and supports up to 128K context for academic and business use.",
"Pro/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.description": "DeepSeek-R1-Distill-Qwen-7B is distilled from Qwen2.5-Math-7B and fine-tuned on 800K curated DeepSeek-R1 samples. It performs strongly, with 92.8% on MATH-500, 55.5% on AIME 2024, and a 1189 CodeForces rating for a 7B model.",
"Pro/deepseek-ai/DeepSeek-R1.description": "DeepSeek-R1 is an RL-driven reasoning model that reduces repetition and improves readability. It uses cold-start data before RL to further boost reasoning, matches OpenAI-o1 on math, code, and reasoning tasks, and improves overall results through careful training.",
"Pro/deepseek-ai/DeepSeek-V3.1-Terminus.description": "DeepSeek-V3.1-Terminus is an updated V3.1 model positioned as a hybrid agent LLM. It fixes user-reported issues and improves stability, language consistency, and reduces mixed Chinese/English and abnormal characters. It integrates Thinking and Non-thinking modes with chat templates for flexible switching. It also improves Code Agent and Search Agent performance for more reliable tool use and multi-step tasks.",
"Pro/deepseek-ai/DeepSeek-V3.2.description": "DeepSeek-V3.2 is a model that combines high computational efficiency with excellent reasoning and Agent performance. Its approach is built on three key technological breakthroughs: DeepSeek Sparse Attention (DSA), an efficient attention mechanism that significantly reduces computational complexity while maintaining model performance, and is specifically optimized for long-context scenarios; a scalable reinforcement learning framework through which model performance can rival GPT-5, with its high-compute version matching Gemini-3.0-Pro in reasoning capabilities; and a large-scale Agent task synthesis pipeline aimed at integrating reasoning capabilities into tool use scenarios, thereby improving instruction following and generalization in complex interactive environments. The model achieved gold medal performance in the 2025 International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI).",
"Pro/deepseek-ai/DeepSeek-V3.description": "DeepSeek-V3 is a 671B-parameter MoE model using MLA and DeepSeekMoE with loss-free load balancing for efficient inference and training. Pretrained on 14.8T high-quality tokens and further tuned with SFT and RL, it outperforms other open models and approaches leading closed models.",
"Pro/moonshotai/Kimi-K2-Instruct-0905.description": "Kimi K2-Instruct-0905 is the newest and most powerful Kimi K2. It is a top-tier MoE model with 1T total and 32B active parameters. Key features include stronger agentic coding intelligence with significant gains on benchmarks and real-world agent tasks, plus improved frontend coding aesthetics and usability.",
"Pro/moonshotai/Kimi-K2-Thinking.description": "Kimi K2 Thinking Turbo is the Turbo variant optimized for reasoning speed and throughput while retaining K2 Thinkings multi-step reasoning and tool use. It is an MoE model with ~1T total parameters, native 256K context, and stable large-scale tool calling for production scenarios with stricter latency and concurrency needs.",
"Pro/moonshotai/Kimi-K2.5.description": "Kimi K2.5 is an open-source native multimodal agent model, built on Kimi-K2-Base, trained on approximately 1.5 trillion mixed vision and text tokens. The model adopts an MoE architecture with 1T total parameters and 32B active parameters, supporting a 256K context window, seamlessly integrating vision and language understanding capabilities.",
"Pro/zai-org/glm-4.7.description": "GLM-4.7 is Zhipu's new generation flagship model with 355B total parameters and 32B active parameters, fully upgraded in general dialogue, reasoning, and agent capabilities. GLM-4.7 enhances Interleaved Thinking and introduces Preserved Thinking and Turn-level Thinking.",
"Pro/zai-org/glm-5.description": "GLM-5 is Zhipu's next-generation large language model, focusing on complex system engineering and long-duration Agent tasks. The model parameters have been expanded to 744B (40B active) and integrate DeepSeek Sparse Attention.",
"QwQ-32B-Preview.description": "Qwen QwQ is an experimental research model focused on improving reasoning.",
"Qwen/QVQ-72B-Preview.description": "QVQ-72B-Preview is a research model from Qwen focused on visual reasoning, with strengths in complex scene understanding and visual math problems.",
"Qwen/QwQ-32B-Preview.description": "Qwen QwQ is an experimental research model focused on improved AI reasoning.",
"Qwen/QwQ-32B.description": "QwQ is a reasoning model in the Qwen family. Compared with standard instruction-tuned models, it adds thinking and reasoning that significantly boost downstream performance, especially on hard problems. QwQ-32B is a mid-size reasoning model competitive with top reasoning models like DeepSeek-R1 and o1-mini. It uses RoPE, SwiGLU, RMSNorm, and attention QKV bias, with 64 layers and 40 Q attention heads (8 KV in GQA).",
"Qwen/Qwen-Image-Edit-2509.description": "Qwen-Image-Edit-2509 is the latest editing version of Qwen-Image from the Qwen team. Built on the 20B Qwen-Image model, it extends strong text rendering into image editing for precise text edits. It uses a dual-control architecture, sending inputs to Qwen2.5-VL for semantic control and a VAE encoder for appearance control, enabling both semantic- and appearance-level editing. It supports local edits (add/remove/modify) and higher-level semantic edits like IP creation and style transfer while preserving semantics. It achieves SOTA results on multiple benchmarks.",
"Qwen/Qwen-Image.description": "Qwen-Image is a 20B-parameter image generation foundation model from the Qwen team. It makes major gains in complex text rendering and precise image editing, especially for high-fidelity Chinese/English text. It supports multi-line and paragraph layouts while keeping typography coherent. Beyond text rendering, it supports a wide range of styles from photorealistic to anime, and advanced editing like style transfer, object add/remove, detail enhancement, text editing, and pose control, aiming to be a comprehensive visual creation foundation.",
"Qwen/Qwen2-72B-Instruct.description": "Qwen 2 Instruct (72B) delivers precise instruction following for enterprise workloads.",
"Qwen/Qwen2-7B-Instruct.description": "Qwen2-7B-Instruct is a 7B instruction-tuned model in the Qwen2 series using Transformer, SwiGLU, QKV bias, and grouped-query attention. It handles large inputs and performs strongly across understanding, generation, multilingual, coding, math, and reasoning benchmarks, outperforming most open models and surpassing Qwen1.5-7B-Chat in multiple evaluations.",
"Qwen/Qwen2-VL-72B-Instruct.description": "Qwen2-VL is the latest Qwen-VL model, reaching SOTA on vision benchmarks like MathVista, DocVQA, RealWorldQA, and MTVQA. It can understand videos over 20 minutes for video QA, dialogue, and content creation. It also supports complex reasoning and decision-making, integrating with devices/robots for vision-driven actions. Beyond English and Chinese, it can read text in many languages including most European languages, Japanese, Korean, Arabic, and Vietnamese.",
"Qwen/Qwen2.5-14B-Instruct.description": "Qwen2.5-14B-Instruct is part of Alibaba Clouds latest LLM series. The 14B model brings notable gains in coding and math, supports 29+ languages, and improves instruction following, structured data understanding, and structured output (especially JSON).",
"Qwen/Qwen2.5-32B-Instruct.description": "Qwen2.5-32B-Instruct is part of Alibaba Clouds latest LLM series. The 32B model brings notable gains in coding and math, supports 29+ languages, and improves instruction following, structured data understanding, and structured output (especially JSON).",
"Qwen/Qwen2.5-72B-Instruct-128K.description": "Qwen2.5-72B-Instruct is part of Alibaba Clouds latest LLM series. The 72B model improves coding and math, supports up to 128K input and over 8K output, offers 29+ languages, and improves instruction following and structured output (especially JSON).",
"Qwen/Qwen2.5-72B-Instruct-Turbo.description": "Qwen2.5 is a new LLM family optimized for instruction-style tasks.",
"Qwen/Qwen2.5-72B-Instruct.description": "Qwen2.5-72B-Instruct is part of Alibaba Clouds latest LLM series. The 72B model brings notable gains in coding and math, supports 29+ languages, and improves instruction following, structured data understanding, and structured output (especially JSON).",
"Qwen/Qwen2.5-7B-Instruct-Turbo.description": "Qwen2.5 is a new LLM family optimized for instruction-style tasks.",
"Qwen/Qwen2.5-7B-Instruct.description": "Qwen2.5-7B-Instruct is part of Alibaba Clouds latest LLM series. The 7B model brings notable gains in coding and math, supports 29+ languages, and improves instruction following, structured data understanding, and structured output (especially JSON).",
"Qwen/Qwen2.5-Coder-32B-Instruct.description": "Qwen2.5 Coder 32B Instruct is the latest Alibaba Cloud code-focused LLM. Built on Qwen2.5 and trained on 5.5T tokens, it significantly improves code generation, reasoning, and repair while retaining math and general strengths, providing a strong base for coding agents.",
"Qwen/Qwen2.5-Coder-7B-Instruct.description": "Qwen2.5-Coder-7B-Instruct is the latest Alibaba Cloud code-focused LLM. Built on Qwen2.5 and trained on 5.5T tokens, it significantly improves code generation, reasoning, and repair while retaining math and general strengths, providing a solid base for coding agents.",
"Qwen/Qwen2.5-VL-32B-Instruct.description": "Qwen2.5-VL-32B-Instruct is a multimodal model from the Qwen team. It recognizes common objects and analyzes text, charts, icons, graphics, and layouts. As a visual agent, it can reason and dynamically control tools, including computer and phone use. It precisely localizes objects and generates structured outputs for invoices and tables. Compared to Qwen2-VL, RL further improves math and problem-solving, with more human-preferred responses.",
"Qwen/Qwen2.5-VL-72B-Instruct.description": "Qwen2.5-VL is the vision-language model in the Qwen2.5 series with major upgrades: stronger visual understanding for objects, text, charts, and layouts; reasoning as a visual agent with dynamic tool use; understanding videos over 1 hour and capturing key events; precise object grounding via boxes or points; and structured outputs for scanned data like invoices and tables.",
"Qwen/Qwen3-14B.description": "Qwen3 is a next-gen Tongyi Qwen model with major gains in reasoning, general ability, agent capability, and multilingual performance, and supports switching thinking modes.",
"Qwen/Qwen3-235B-A22B-Instruct-2507.description": "Qwen3-235B-A22B-Instruct-2507 is a flagship Qwen3 MoE model with 235B total and 22B active parameters. It is an updated non-thinking version focused on improving instruction following, logical reasoning, text understanding, math, science, coding, and tool use. It also expands multilingual long-tail knowledge and better aligns with user preferences for subjective open-ended tasks.",
"Qwen/Qwen3-235B-A22B-Thinking-2507.description": "Qwen3-235B-A22B-Thinking-2507 is a Qwen3 model focused on hard complex reasoning. It uses an MoE architecture with 235B total and ~22B active per token to boost efficiency. As a dedicated thinking model, it shows major gains in logic, math, science, coding, and academic benchmarks, reaching top-tier open thinking performance. It also improves instruction following, tool use, and text generation, and natively supports 256K context for deep reasoning and long documents.",
"Qwen/Qwen3-235B-A22B.description": "Qwen3 235B A22B is the Qwen3 ultra-scale model delivering top-tier AI capability.",
"Qwen/Qwen3-30B-A3B-Instruct-2507.description": "Qwen3-30B-A3B-Instruct-2507 is the updated non-thinking version of Qwen3-30B-A3B. It is an MoE model with 30.5B total and 3.3B active parameters. It significantly improves instruction following, logical reasoning, text understanding, math, science, coding, and tool use, expands multilingual long-tail knowledge, and better aligns with user preferences on subjective open tasks. It supports 256K context. This model is non-thinking only and will not output `<think></think>` tags.",
"Qwen/Qwen3-30B-A3B-Thinking-2507.description": "Qwen3-30B-A3B-Thinking-2507 is the latest thinking model in the Qwen3 series. It is an MoE model with 30.5B total and 3.3B active parameters, focused on complex tasks. It shows significant gains in logic, math, science, coding, and academic benchmarks, and improves instruction following, tool use, text generation, and preference alignment. It natively supports 256K context and can extend to 1M tokens. This version is designed for thinking mode with detailed step-by-step reasoning and strong agent capabilities.",
"Qwen/Qwen3-32B.description": "Qwen3 is a next-gen Tongyi Qwen model with major gains in reasoning, general ability, agent capability, and multilingual performance, and supports switching thinking modes.",
"Qwen/Qwen3-8B.description": "Qwen3 is a next-gen Tongyi Qwen model with major gains in reasoning, general ability, agent capability, and multilingual performance, and supports switching thinking modes.",
"Qwen/Qwen3-Coder-30B-A3B-Instruct.description": "Qwen3-Coder-30B-A3B-Instruct is a Qwen3 code model from the Qwen team. It is streamlined for high performance and efficiency while boosting code capabilities. It shows strong advantages on agentic coding, automated browser operations, and tool use among open models. It natively supports 256K context and can extend to 1M tokens for codebase-level understanding. It powers agentic coding on platforms like Qwen Code and CLINE with a dedicated function-calling format.",
"Qwen/Qwen3-Coder-480B-A35B-Instruct.description": "Qwen3-Coder-480B-A35B-Instruct is Alibabas most agentic code model to date. It is an MoE model with 480B total and 35B active parameters, balancing efficiency and performance. It natively supports 256K context and can extend to 1M tokens via YaRN, enabling large codebase handling. Designed for agentic coding workflows, it can interact with tools and environments to solve complex programming tasks. It achieves top open-model results on coding and agent benchmarks, comparable to leading models like Claude Sonnet 4.",
"Qwen/Qwen3-Next-80B-A3B-Instruct.description": "Qwen3-Next-80B-A3B-Instruct is a next-gen base model using the Qwen3-Next architecture for extreme training and inference efficiency. It combines hybrid attention (Gated DeltaNet + Gated Attention), highly sparse MoE, and training stability optimizations. With 80B total parameters but ~3B active at inference, it reduces compute and delivers 10x+ throughput over Qwen3-32B on >32K contexts. This instruction-tuned version targets general tasks (no Thinking mode). It performs comparably to Qwen3-235B on some benchmarks and shows strong advantages in ultra-long context tasks.",
"Qwen/Qwen3-Next-80B-A3B-Thinking.description": "Qwen3-Next-80B-A3B-Thinking is a next-gen base model for complex reasoning. It uses the Qwen3-Next architecture with hybrid attention (Gated DeltaNet + Gated Attention) and highly sparse MoE for extreme training/inference efficiency. With 80B total parameters but ~3B active at inference, it cuts compute and delivers 10x+ throughput over Qwen3-32B on >32K contexts. This Thinking version targets multi-step tasks like proofs, code synthesis, logic analysis, and planning, outputting structured chain-of-thought. It outperforms Qwen3-32B-Thinking and beats Gemini-2.5-Flash-Thinking on several benchmarks.",
"Qwen/Qwen3-Omni-30B-A3B-Captioner.description": "Qwen3-Omni-30B-A3B-Captioner is a Qwen3-series VLM built for high-quality, detailed, accurate image captions. It uses a 30B-parameter MoE architecture to deeply understand images and produce fluent descriptions, excelling at detail capture, scene understanding, object recognition, and relational reasoning.",
"Qwen/Qwen3-Omni-30B-A3B-Instruct.description": "Qwen3-Omni-30B-A3B-Instruct is a Qwen3-series MoE model with 30B total and 3B active parameters, delivering strong performance at lower inference cost. Trained on high-quality multi-source multilingual data, it supports full-modal inputs (text, images, audio, video) and cross-modal understanding and generation.",
"Qwen/Qwen3-Omni-30B-A3B-Thinking.description": "Qwen3-Omni-30B-A3B-Thinking is the core \"Thinker\" component of Qwen3-Omni. It processes multimodal inputs (text, audio, images, video) and performs complex chain-of-thought reasoning, unifying inputs into a shared representation for deep cross-modal understanding. It is an MoE model with 30B total and 3B active parameters, balancing strong reasoning and compute efficiency.",
"Qwen/Qwen3-VL-235B-A22B-Instruct.description": "Qwen3-VL-235B-A22B-Instruct is a large instruction-tuned Qwen3-VL model built on MoE, delivering excellent multimodal understanding and generation. It natively supports 256K context and is suitable for high-concurrency production multimodal services.",
"Qwen/Qwen3-VL-235B-A22B-Thinking.description": "Qwen3-VL-235B-A22B-Thinking is the flagship thinking version of Qwen3-VL, optimized for complex multimodal reasoning, long-context reasoning, and agent interaction in enterprise scenarios.",
"Qwen/Qwen3-VL-30B-A3B-Instruct.description": "Qwen3-VL-30B-A3B-Instruct is the instruction-tuned Qwen3-VL model with strong vision-language understanding and generation. It natively supports 256K context for multimodal chat and image-conditioned generation.",
"Qwen/Qwen3-VL-30B-A3B-Thinking.description": "Qwen3-VL-30B-A3B-Thinking is the reasoning-enhanced version of Qwen3-VL, optimized for multimodal reasoning, image-to-code, and complex visual understanding. It supports 256K context with stronger chain-of-thought ability.",
"Qwen/Qwen3-VL-32B-Instruct.description": "Qwen3-VL-32B-Instruct is a vision-language model from the Qwen team with leading SOTA results on multiple VL benchmarks. It supports megapixel-resolution images and offers strong visual understanding, multilingual OCR, fine-grained visual grounding, and visual dialogue. It handles complex multimodal tasks and supports tool calling and prefix completion.",
"Qwen/Qwen3-VL-32B-Thinking.description": "Qwen3-VL-32B-Thinking is optimized for complex visual reasoning. It includes a built-in thinking mode that generates intermediate reasoning steps before answers, boosting multi-step logic, planning, and complex reasoning. It supports megapixel images, strong visual understanding, multilingual OCR, fine-grained grounding, visual dialogue, tool calling, and prefix completion.",
"Qwen/Qwen3-VL-8B-Instruct.description": "Qwen3-VL-8B-Instruct is a Qwen3 vision-language model built on Qwen3-8B-Instruct and trained on large image-text data. It excels at general visual understanding, vision-centric dialogue, and multilingual text recognition in images, suitable for visual QA, captioning, multimodal instruction following, and tool use.",
"Qwen/Qwen3-VL-8B-Thinking.description": "Qwen3-VL-8B-Thinking is the visual thinking version of Qwen3, optimized for complex multi-step reasoning. It generates a thinking chain before answers to improve accuracy, ideal for deep visual QA and detailed image analysis.",
"Qwen/Qwen3.5-122B-A10B.description": "Qwen3.5-122B-A10B is a native multimodal large language model from the Qwen team with 122B total parameters and only 10B active parameters. It adopts an efficient hybrid architecture combining Gated Delta Networks and Sparse Mixture-of-Experts (MoE), natively supporting 256K context length with extensibility to approximately 1M tokens.",
"Qwen/Qwen3.5-27B.description": "Qwen3.5-27B is a native multimodal large language model from the Qwen team with 27B parameters. It adopts an efficient hybrid architecture combining Gated Delta Networks and Gated Attention, natively supporting 256K context length with extensibility to approximately 1M tokens.",
"Qwen/Qwen3.5-35B-A3B.description": "Qwen3.5-35B-A3B is a native multimodal large language model from the Qwen team with 35B total parameters and only 3B active parameters. It adopts an efficient hybrid architecture combining Gated Delta Networks and Sparse Mixture-of-Experts (MoE), natively supporting 256K context length with extensibility to approximately 1M tokens.",
"Qwen/Qwen3.5-397B-A17B.description": "Qwen3.5-397B-A17B is the latest vision-language model in the Qwen3.5 series, using a Mixture-of-Experts (MoE) architecture with 397B total parameters and 17B active parameters. It natively supports 256K context length with extensibility to approximately 1M tokens, supports 201 languages, and provides unified vision-language understanding, tool calling, and reasoning capabilities.",
"Qwen/Qwen3.5-4B.description": "Qwen3.5-4B is a native multimodal large language model from the Qwen team with 4B parameters, the most lightweight Dense model in the Qwen3.5 series. It adopts an efficient hybrid architecture combining Gated Delta Networks and Gated Attention, natively supporting 256K context length with extensibility to approximately 1M tokens.",
"Qwen/Qwen3.5-9B.description": "Qwen3.5-9B is a native multimodal large language model from the Qwen team with 9B parameters. As a lightweight Dense model in the Qwen3.5 series, it adopts an efficient hybrid architecture combining Gated Delta Networks and Gated Attention, natively supporting 256K context length with extensibility to approximately 1M tokens.",
"Qwen2-72B-Instruct.description": "Qwen2 is the latest Qwen series, supporting a 128k context window. Compared with todays best open models, Qwen2-72B significantly surpasses leading models in natural language understanding, knowledge, code, math, and multilingual capabilities.",
"Qwen2-7B-Instruct.description": "Qwen2 is the latest Qwen series, surpassing the best open models of similar size and even larger models. Qwen2 7B shows significant advantages on multiple benchmarks, especially in code and Chinese understanding.",
"Qwen2-VL-72B.description": "Qwen2-VL-72B is a powerful vision-language model supporting multimodal image-text processing, accurately recognizing image content and generating relevant descriptions or answers.",
"Qwen2.5-14B-Instruct.description": "Qwen2.5-14B-Instruct is a 14B-parameter LLM with strong performance, optimized for Chinese and multilingual scenarios, supporting intelligent Q&A and content generation.",
"Qwen2.5-32B-Instruct.description": "Qwen2.5-32B-Instruct is a 32B-parameter LLM with balanced performance, optimized for Chinese and multilingual scenarios, supporting intelligent Q&A and content generation.",
"Qwen2.5-72B-Instruct.description": "LLM for Chinese and English, tuned for language, coding, math, and reasoning.",
"Qwen2.5-7B-Instruct.description": "Qwen2.5-7B-Instruct is a 7B-parameter LLM that supports function calling and seamless external system integration, greatly improving flexibility and extensibility. It is optimized for Chinese and multilingual scenarios, supporting intelligent Q&A and content generation.",
"Qwen2.5-Coder-14B-Instruct.description": "Qwen2.5-Coder-14B-Instruct is a large-scale pre-trained coding instruction model with strong code understanding and generation. It efficiently handles a wide range of programming tasks, ideal for smart coding, automated script generation, and programming Q&A.",
"Qwen2.5-Coder-32B-Instruct.description": "Advanced LLM for code generation, reasoning, and bug fixing across major programming languages.",
"Qwen3-235B-A22B-Instruct-2507-FP8.description": "Qwen3 235B A22B Instruct 2507 is optimized for advanced reasoning and instruction-following, using MoE to keep reasoning efficient at scale.",
"Qwen3-235B.description": "Qwen3-235B-A22B is a MoE model that introduces a hybrid reasoning mode, letting users switch seamlessly between thinking and non-thinking. It supports understanding and reasoning across 119 languages and dialects and has strong tool-calling capabilities, competing with mainstream models like DeepSeek R1, OpenAI o1, o3-mini, Grok 3, and Google Gemini 2.5 Pro across benchmarks in general ability, code and math, multilingual capability, and knowledge reasoning.",
"Qwen3-32B.description": "Qwen3-32B is a dense model that introduces a hybrid reasoning mode, letting users switch between thinking and non-thinking. With architecture improvements, more data, and better training, it performs on par with Qwen2.5-72B.",
"SenseChat-128K.description": "Base V4 with 128K context, strong in long-text understanding and generation.",
"SenseChat-32K.description": "Base V4 with 32K context, flexible for many scenarios.",
"SenseChat-5-1202.description": "Latest version based on V5.5, with significant gains in Chinese/English fundamentals, chat, STEM knowledge, humanities knowledge, writing, math/logic, and length control.",
"SenseChat-5-Cantonese.description": "Designed for Hong Kong dialogue habits, slang, and local knowledge; surpasses GPT-4 in Cantonese understanding and rivals GPT-4 Turbo in knowledge, reasoning, math, and coding.",
"SenseChat-5-beta.description": "Some performance exceeds SenseChat-5-1202.",
"SenseChat-5.description": "Latest V5.5 with 128K context; major gains in math reasoning, English chat, instruction following, and long-text understanding, comparable to GPT-4o.",
"SenseChat-Character-Pro.description": "Advanced character chat model with 32K context, improved capability, and Chinese/English support.",
"SenseChat-Character.description": "Standard character chat model with 8K context and high response speed.",
"SenseChat-Turbo-1202.description": "Latest lightweight model reaching 90%+ of full-model capability with significantly lower inference cost.",
"SenseChat-Turbo.description": "Suitable for fast Q&A and model fine-tuning scenarios.",
"SenseChat-Vision.description": "Latest V5.5 with multi-image input and broad core improvements in attribute recognition, spatial relations, action/event detection, scene understanding, emotion recognition, commonsense reasoning, and text understanding/generation.",
"SenseChat.description": "Base V4 with 4K context and strong general capability.",
"SenseNova-V6-5-Pro.description": "With comprehensive updates to multimodal, language, and reasoning data plus training strategy optimization, the new model significantly improves multimodal reasoning and generalized instruction following, supports up to a 128k context window, and excels in OCR and cultural tourism IP recognition tasks.",
"SenseNova-V6-5-Turbo.description": "With comprehensive updates to multimodal, language, and reasoning data plus training strategy optimization, the new model significantly improves multimodal reasoning and generalized instruction following, supports up to a 128k context window, and excels in OCR and cultural tourism IP recognition tasks.",
"SenseNova-V6-Pro.description": "Natively unifies image, text, and video, breaking traditional multimodal silos; wins top spots on OpenCompass and SuperCLUE.",
"SenseNova-V6-Reasoner.description": "Combines vision and language deep reasoning, supporting slow thinking and full chain-of-thought.",
"SenseNova-V6-Turbo.description": "Natively unifies image, text, and video, breaking traditional multimodal silos. It leads across core multimodal and language capabilities and ranks top-tier in multiple evaluations.",
"Skylark2-lite-8k.description": "Skylark 2nd-gen model. Skylark2-lite has fast responses for real-time, cost-sensitive scenarios with lower accuracy needs, with an 8K context window.",
"Skylark2-pro-32k.description": "Skylark 2nd-gen model. Skylark2-pro offers higher accuracy for complex text generation such as professional copywriting, novel writing, and high-quality translation, with a 32K context window.",
"Skylark2-pro-4k.description": "Skylark 2nd-gen model. Skylark2-pro offers higher accuracy for complex text generation such as professional copywriting, novel writing, and high-quality translation, with a 4K context window.",
"Skylark2-pro-character-4k.description": "Skylark 2nd-gen model. Skylark2-pro-character excels at roleplay and chat, matching prompts with distinct persona styles and natural dialogue for chatbots, virtual assistants, and customer service, with fast responses.",
"Skylark2-pro-turbo-8k.description": "Skylark 2nd-gen model. Skylark2-pro-turbo-8k offers faster inference at lower cost with an 8K context window.",
"THUDM/GLM-4-32B-0414.description": "GLM-4-32B-0414 is a next-gen open GLM model with 32B parameters, comparable to OpenAI GPT and DeepSeek V3/R1 series in performance.",
"THUDM/GLM-4-9B-0414.description": "GLM-4-9B-0414 is a 9B GLM model that inherits GLM-4-32B techniques while offering a lighter deployment. It performs well in code generation, web design, SVG generation, and search-based writing.",
"THUDM/GLM-4.1V-9B-Thinking.description": "GLM-4.1V-9B-Thinking is an open-source VLM from Zhipu AI and Tsinghua KEG Lab, designed for complex multimodal cognition. Built on GLM-4-9B-0414, it adds chain-of-thought reasoning and RL to significantly improve cross-modal reasoning and stability.",
"THUDM/GLM-Z1-32B-0414.description": "GLM-Z1-32B-0414 is a deep-thinking reasoning model built from GLM-4-32B-0414 with cold-start data and expanded RL, further trained on math, code, and logic. It significantly improves math ability and complex task solving over the base model.",
"THUDM/GLM-Z1-9B-0414.description": "GLM-Z1-9B-0414 is a small 9B-parameter GLM model that retains open-source strengths while delivering impressive capability. It performs strongly on math reasoning and general tasks, leading its size class among open models.",
"THUDM/glm-4-9b-chat.description": "GLM-4-9B-Chat is the open-source GLM-4 model from Zhipu AI. It performs strongly across semantics, math, reasoning, code, and knowledge. Beyond multi-turn chat, it supports web browsing, code execution, custom tool calls, and long-text reasoning. It supports 26 languages (including Chinese, English, Japanese, Korean, German). It performs well on AlignBench-v2, MT-Bench, MMLU, and C-Eval, and supports up to 128K context for academic and business use.",
"Tongyi-Zhiwen/QwenLong-L1-32B.description": "QwenLong-L1-32B is the first long-context reasoning model (LRM) trained with RL, optimized for long-text reasoning. Its progressive context expansion RL enables stable transfer from short to long context. It surpasses OpenAI-o3-mini and Qwen3-235B-A22B on seven long-context document QA benchmarks, rivaling Claude-3.7-Sonnet-Thinking. It is especially strong at math, logic, and multi-hop reasoning.",
"Yi-34B-Chat.description": "Yi-1.5-34B retains the series strong general language abilities while using incremental training on 500B high-quality tokens to significantly improve math logic and coding.",
"abab5.5-chat.description": "Built for productivity scenarios with complex task handling and efficient text generation for professional use.",
"abab5.5s-chat.description": "Designed for Chinese persona chat, delivering high-quality Chinese dialogue for various applications.",
"abab6.5g-chat.description": "Designed for multilingual persona chat, supporting high-quality dialogue generation in English and other languages.",
"abab6.5s-chat.description": "Suitable for a wide range of NLP tasks, including text generation and dialogue systems.",
"abab6.5t-chat.description": "Optimized for Chinese persona chat, providing fluent dialogue that fits Chinese expression habits.",
"accounts/fireworks/models/deepseek-r1.description": "DeepSeek-R1 is a state-of-the-art LLM optimized with reinforcement learning and cold-start data, delivering excellent reasoning, math, and coding performance.",
"accounts/fireworks/models/deepseek-v3.description": "A powerful Mixture-of-Experts (MoE) language model from DeepSeek with 671B total parameters and 37B active parameters per token.",
"accounts/fireworks/models/llama-v3-70b-instruct.description": "Meta developed and released the Meta Llama 3 LLM series, which includes pre-trained and instruction-tuned text generation models at 8B and 70B. The Llama 3 instruction-tuned models are optimized for conversational use and outperform many existing open chat models on common industry benchmarks.",
"accounts/fireworks/models/llama-v3-8b-instruct-hf.description": "The Meta Llama 3 instruction-tuned models are optimized for conversational use and outperform many existing open chat models on common industry benchmarks. Llama 3 8B Instruct (HF version) is the original FP16 version of Llama 3 8B Instruct, with results expected to match the official Hugging Face implementation.",
"accounts/fireworks/models/llama-v3-8b-instruct.description": "Meta developed and released the Meta Llama 3 LLM series, a collection of pre-trained and instruction-tuned text generation models at 8B and 70B. The Llama 3 instruction-tuned models are optimized for conversational use and outperform many existing open chat models on common industry benchmarks.",
"accounts/fireworks/models/llama-v3p1-405b-instruct.description": "Meta Llama 3.1 is a multilingual LLM family with pre-trained and instruction-tuned generation models at 8B, 70B, and 405B sizes. The instruction-tuned text models are optimized for multilingual dialogue and outperform many existing open and closed chat models on common industry benchmarks. 405B is the most capable model in the Llama 3.1 family, using FP8 inference that closely matches the reference implementation.",
"accounts/fireworks/models/llama-v3p1-70b-instruct.description": "Meta Llama 3.1 is a multilingual LLM family with pre-trained and instruction-tuned generation models at 8B, 70B, and 405B sizes. The instruction-tuned text models are optimized for multilingual dialogue and outperform many existing open and closed chat models on common industry benchmarks.",
"accounts/fireworks/models/llama-v3p1-8b-instruct.description": "Meta Llama 3.1 is a multilingual LLM family with pre-trained and instruction-tuned generation models at 8B, 70B, and 405B sizes. The instruction-tuned text models are optimized for multilingual dialogue and outperform many existing open and closed chat models on common industry benchmarks.",
"accounts/fireworks/models/llama-v3p2-11b-vision-instruct.description": "An instruction-tuned vision reasoning model from Meta with 11B parameters, optimized for visual recognition, image reasoning, captioning, and image-related Q&A. It understands visual data such as charts and graphs and bridges vision and language by generating textual descriptions of image details.",
"accounts/fireworks/models/llama-v3p2-3b-instruct.description": "Llama 3.2 3B Instruct is a lightweight multilingual model from Meta, designed for efficient runtime with significant latency and cost advantages over larger models. Typical use cases include query/prompt rewriting and writing assistance.",
"accounts/fireworks/models/llama-v3p2-90b-vision-instruct.description": "An instruction-tuned vision reasoning model from Meta with 90B parameters, optimized for visual recognition, image reasoning, captioning, and image-related Q&A. It understands visual data such as charts and graphs and bridges vision and language by generating textual descriptions of image details. Note: this model is currently provided experimentally as a serverless model. For production use, note that Fireworks may retire the deployment on short notice.",
"accounts/fireworks/models/llama-v3p3-70b-instruct.description": "Llama 3.3 70B Instruct is the December update to Llama 3.1 70B. It improves tool use, multilingual text support, math, and coding over the July 2024 release. It reaches industry-leading performance in reasoning, math, and instruction following, offering performance comparable to 3.1 405B with significant speed and cost advantages.",
"accounts/fireworks/models/mistral-small-24b-instruct-2501.description": "A 24B-parameter model with state-of-the-art capability comparable to larger models.",
"accounts/fireworks/models/mixtral-8x22b-instruct.description": "Mixtral MoE 8x22B Instruct v0.1 is the instruction-tuned version of Mixtral MoE 8x22B v0.1, with the chat completion API enabled.",
"accounts/fireworks/models/mixtral-8x7b-instruct.description": "Mixtral MoE 8x7B Instruct is the instruction-tuned version of Mixtral MoE 8x7B, with the chat completion API enabled.",
"accounts/fireworks/models/mythomax-l2-13b.description": "An improved variant of MythoMix, possibly its more refined form, merging MythoLogic-L2 and Huginn with a highly experimental tensor-type merge technique. Its unique nature makes it excellent for storytelling and roleplay.",
"accounts/fireworks/models/phi-3-vision-128k-instruct.description": "Phi-3-Vision-128K-Instruct is a lightweight, state-of-the-art open multimodal model built from synthetic data and curated public web datasets, focusing on high-quality, reasoning-intensive text and vision data. It belongs to the Phi-3 family, with a multimodal version supporting a 128K context length (in tokens). The model undergoes rigorous enhancement, including supervised fine-tuning and direct preference optimization, to ensure accurate instruction following and strong safety measures.",
"accounts/fireworks/models/qwen-qwq-32b-preview.description": "The Qwen QwQ model focuses on advancing AI reasoning, demonstrating that open models can rival closed frontier models in reasoning. QwQ-32B-Preview is an experimental release that matches o1 and surpasses GPT-4o and Claude 3.5 Sonnet on reasoning and analysis across GPQA, AIME, MATH-500, and LiveCodeBench. Note: this model is currently provided experimentally as a serverless model. For production use, note that Fireworks may retire the deployment on short notice.",
"accounts/fireworks/models/qwen2-vl-72b-instruct.description": "The 72B Qwen-VL model is Alibabas latest iteration, reflecting nearly a year of innovation.",
"accounts/fireworks/models/qwen2p5-72b-instruct.description": "Qwen2.5 is a decoder-only LLM series developed by the Qwen team and Alibaba Cloud, offering 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B sizes, with both base and instruction-tuned variants.",
"accounts/fireworks/models/qwen2p5-coder-32b-instruct.description": "Qwen2.5-Coder is the latest Qwen LLM designed for code (formerly CodeQwen). Note: this model is currently provided experimentally as a serverless model. For production use, note that Fireworks may retire the deployment on short notice.",
"accounts/yi-01-ai/models/yi-large.description": "Yi-Large is a top-tier LLM that ranks just below GPT-4, Gemini 1.5 Pro, and Claude 3 Opus on the LMSYS leaderboard. It excels in multilingual capability, especially Spanish, Chinese, Japanese, German, and French. Yi-Large is also developer-friendly, using the same API schema as OpenAI for easy integration.",
"ai21-jamba-1.5-large.description": "A 398B-parameter (94B active) multilingual model with a 256K context window, function calling, structured output, and grounded generation.",
"ai21-jamba-1.5-mini.description": "A 52B-parameter (12B active) multilingual model with a 256K context window, function calling, structured output, and grounded generation.",
"ai21-labs/AI21-Jamba-1.5-Large.description": "A 398B-parameter (94B active) multilingual model with a 256K context window, function calling, structured output, and grounded generation.",
"ai21-labs/AI21-Jamba-1.5-Mini.description": "A 52B-parameter (12B active) multilingual model with a 256K context window, function calling, structured output, and grounded generation.",
"alibaba/qwen-3-14b.description": "Qwen3 is the latest generation in the Qwen series, offering a comprehensive set of dense and MoE models. Built on extensive training, it brings breakthroughs in reasoning, instruction following, agent capabilities, and multilingual support.",
"alibaba/qwen-3-235b.description": "Qwen3 is the latest generation in the Qwen series, offering a comprehensive set of dense and MoE models. Built on extensive training, it brings breakthroughs in reasoning, instruction following, agent capabilities, and multilingual support.",
"alibaba/qwen-3-30b.description": "Qwen3 is the latest generation in the Qwen series, offering a comprehensive set of dense and MoE models. Built on extensive training, it brings breakthroughs in reasoning, instruction following, agent capabilities, and multilingual support.",
"alibaba/qwen-3-32b.description": "Qwen3 is the latest generation in the Qwen series, offering a comprehensive set of dense and MoE models. Built on extensive training, it brings breakthroughs in reasoning, instruction following, agent capabilities, and multilingual support.",
"alibaba/qwen3-coder.description": "Qwen3-Coder-480B-A35B-Instruct is Qwens most agentic code model, performing strongly on agentic coding, agentic browser use, and other core coding tasks, matching Claude Sonnet-level results.",
"amazon/nova-lite.description": "A very low-cost multimodal model with extremely fast processing of image, video, and text inputs.",
"amazon/nova-micro.description": "A text-only model offering ultra-low latency at very low cost.",
"amazon/nova-pro.description": "A highly capable multimodal model with the best balance of accuracy, speed, and cost for a wide range of tasks.",
"amazon/titan-embed-text-v2.description": "Amazon Titan Text Embeddings V2 is a lightweight, efficient multilingual embedding model supporting 1024, 512, and 256 dimensions.",
"anthropic.claude-3-5-sonnet-20240620-v1:0.description": "Claude 3.5 Sonnet raises the industry standard, outperforming competitors and Claude 3 Opus across broad evaluations while keeping mid-tier speed and cost.",
"anthropic.claude-3-5-sonnet-20241022-v2:0.description": "Claude 3.5 Sonnet raises the industry standard, outperforming competitors and Claude 3 Opus across broad evaluations while keeping mid-tier speed and cost.",
"anthropic.claude-3-haiku-20240307-v1:0.description": "Claude 3 Haiku is Anthropics fastest, most compact model, delivering near-instant responses for simple queries. It enables seamless, human-like AI experiences and supports image input with a 200K context window.",
"anthropic.claude-3-opus-20240229-v1:0.description": "Claude 3 Opus is Anthropics most powerful AI model with state-of-the-art performance on highly complex tasks. It handles open-ended prompts and novel scenarios with exceptional fluency and human-like understanding, and supports image input with a 200K context window.",
"anthropic.claude-3-sonnet-20240229-v1:0.description": "Claude 3 Sonnet balances intelligence and speed for enterprise workloads, offering strong value at lower cost. It is designed as a reliable workhorse for scaled AI deployments and supports image input with a 200K context window.",
"anthropic.claude-instant-v1.description": "A fast, economical, yet capable model for everyday chat, text analysis, summarization, and document Q&A.",
"anthropic.claude-v2.description": "A highly capable model across tasks from complex dialogue and creative generation to detailed instruction following.",
"anthropic.claude-v2:1.description": "An updated Claude 2 with double the context window and improved reliability, hallucination rate, and evidence-based accuracy for long documents and RAG.",
"anthropic/claude-3-haiku.description": "Claude 3 Haiku is Anthropics fastest model, designed for enterprise workloads with longer prompts. It can quickly analyze large documents like quarterly reports, contracts, or legal cases at half the cost of peers.",
"anthropic/claude-3-opus.description": "Claude 3 Opus is Anthropics most intelligent model with market-leading performance on highly complex tasks, handling open-ended prompts and novel scenarios with exceptional fluency and human-like understanding.",
"anthropic/claude-3.5-haiku.description": "Claude 3.5 Haiku features enhanced speed, coding accuracy, and tool use, suitable for scenarios with demanding requirements for speed and tool interaction.",
"anthropic/claude-3.5-sonnet.description": "Claude 3.5 Sonnet is the fast, efficient model in the Sonnet family, offering better coding and reasoning performance, with some versions gradually replaced by Sonnet 3.7 and later.",
"anthropic/claude-3.7-sonnet.description": "Claude 3.7 Sonnet is an upgraded Sonnet model with stronger reasoning and coding, suitable for enterprise-grade complex tasks.",
"anthropic/claude-haiku-4.5.description": "Claude Haiku 4.5 is Anthropics high-performance fast model, delivering very low latency while maintaining high accuracy.",
"anthropic/claude-opus-4.1.description": "Opus 4.1 is Anthropics high-end model optimized for programming, complex reasoning, and long-running tasks.",
"anthropic/claude-opus-4.5.description": "Claude Opus 4.5 is Anthropics flagship model, combining top-tier intelligence with scalable performance for complex, high-quality reasoning tasks.",
"anthropic/claude-opus-4.description": "Opus 4 is Anthropics flagship model designed for complex tasks and enterprise applications.",
"anthropic/claude-sonnet-4.5.description": "Claude Sonnet 4.5 is Anthropics latest hybrid reasoning model optimized for complex reasoning and coding.",
"anthropic/claude-sonnet-4.description": "Claude Sonnet 4 is Anthropics hybrid reasoning model with mixed thinking and non-thinking capability.",
"ascend-tribe/pangu-pro-moe.description": "Pangu-Pro-MoE 72B-A16B is a sparse LLM with 72B total and 16B active parameters, based on a grouped MoE (MoGE) architecture. It groups experts during selection and constrains tokens to activate equal experts per group, balancing load and improving deployment efficiency on Ascend.",
"aya.description": "Aya 23 is Coheres multilingual model supporting 23 languages for diverse use cases.",
"aya:35b.description": "Aya 23 is Coheres multilingual model supporting 23 languages for diverse use cases.",
"azure-DeepSeek-R1-0528.description": "Deployed by Microsoft; DeepSeek R1 has been upgraded to DeepSeek-R1-0528. The update increases compute and post-training algorithm optimizations, significantly improving reasoning depth and inference. It performs strongly on math, coding, and general logic benchmarks, approaching leading models like O3 and Gemini 2.5 Pro.",
"baichuan-m2-32b.description": "Baichuan M2 32B is a MoE model from Baichuan Intelligence with strong reasoning.",
"baichuan/baichuan2-13b-chat.description": "Baichuan-13B is an open-source, commercially usable 13B-parameter LLM from Baichuan, achieving best-in-class results for its size on authoritative Chinese and English benchmarks.",
"baidu/ERNIE-4.5-300B-A47B.description": "ERNIE-4.5-300B-A47B is a Baidu MoE LLM with 300B total parameters and 47B active per token, balancing strong performance and compute efficiency. As a core ERNIE 4.5 model, it excels at understanding, generation, reasoning, and programming. It uses a multimodal heterogeneous MoE pretraining method with joint text-vision training to boost overall capability, especially instruction following and world knowledge.",
"baidu/ernie-5.0-thinking-preview.description": "ERNIE 5.0 Thinking Preview is Baidus next-generation native multimodal ERNIE model, strong in multimodal understanding, instruction following, creation, factual Q&A, and tool calling.",
"black-forest-labs/flux-1.1-pro.description": "FLUX 1.1 Pro is a faster, improved FLUX Pro with excellent image quality and prompt adherence.",
"black-forest-labs/flux-dev.description": "FLUX Dev is the development version of FLUX for non-commercial use.",
"black-forest-labs/flux-pro.description": "FLUX Pro is the professional FLUX model for high-quality image output.",
"black-forest-labs/flux-schnell.description": "FLUX Schnell is a fast image generation model optimized for speed.",
"c4ai-aya-expanse-32b.description": "Aya Expanse is a high-performance 32B multilingual model that uses instruction tuning, data arbitrage, preference training, and model merging to rival monolingual models. It supports 23 languages.",
"c4ai-aya-expanse-8b.description": "Aya Expanse is a high-performance 8B multilingual model that uses instruction tuning, data arbitrage, preference training, and model merging to rival monolingual models. It supports 23 languages.",
"c4ai-aya-vision-32b.description": "Aya Vision is a state-of-the-art multimodal model that performs strongly on key language, text, and vision benchmarks. It supports 23 languages. This 32B version focuses on top-tier multilingual performance.",
"c4ai-aya-vision-8b.description": "Aya Vision is a state-of-the-art multimodal model that performs strongly on key language, text, and vision benchmarks. This 8B version focuses on low latency and strong performance.",
"charglm-3.description": "CharGLM-3 is built for roleplay and emotional companionship, supporting ultra-long multi-turn memory and personalized dialogue.",
"charglm-4.description": "CharGLM-4 is built for roleplay and emotional companionship, supporting ultra-long multi-turn memory and personalized dialogue.",
"chatgpt-4o-latest.description": "ChatGPT-4o is a dynamic model updated in real time. It combines strong language understanding and generation for large-scale use cases like customer support, education, and technical assistance.",
"claude-2.0.description": "Claude 2 delivers key enterprise improvements, including a leading 200K-token context, reduced hallucinations, system prompts, and a new test feature: tool calling.",
"claude-2.1.description": "Claude 2 delivers key enterprise improvements, including a leading 200K-token context, reduced hallucinations, system prompts, and a new test feature: tool calling.",
"claude-3-5-haiku-20241022.description": "Claude 3.5 Haiku is Anthropics fastest next-gen model. Compared to Claude 3 Haiku, it improves across skills and surpasses the prior largest model Claude 3 Opus on many intelligence benchmarks.",
"claude-3-5-haiku-latest.description": "Claude 3.5 Haiku delivers fast responses for lightweight tasks.",
"claude-3-7-sonnet-20250219.description": "Claude 3.7 Sonnet is Anthropics most intelligent model and the first hybrid reasoning model on the market. It can produce near-instant responses or extended step-by-step reasoning that users can see. Sonnet is especially strong at coding, data science, vision, and agent tasks.",
"claude-3-7-sonnet-latest.description": "Claude 3.7 Sonnet is Anthropics latest and most capable model for highly complex tasks, excelling in performance, intelligence, fluency, and understanding.",
"claude-3-haiku-20240307.description": "Claude 3 Haiku is Anthropics fastest and most compact model, designed for near-instant responses with fast, accurate performance.",
"claude-3-opus-20240229.description": "Claude 3 Opus is Anthropics most powerful model for highly complex tasks, excelling in performance, intelligence, fluency, and comprehension.",
"claude-3-sonnet-20240229.description": "Claude 3 Sonnet balances intelligence and speed for enterprise workloads, delivering high utility at lower cost and reliable large-scale deployment.",
"claude-3.5-sonnet.description": "Claude 3.5 Sonnet excels at coding, writing, and complex reasoning.",
"claude-3.7-sonnet-thought.description": "Claude 3.7 Sonnet with extended thinking for complex reasoning tasks.",
"claude-3.7-sonnet.description": "Claude 3.7 Sonnet is an upgraded version with extended context and capabilities.",
"claude-haiku-4-5-20251001.description": "Claude Haiku 4.5 is Anthropic's fastest and most intelligent Haiku model, with lightning speed and extended thinking.",
"claude-haiku-4.5.description": "Claude Haiku 4.5 is a fast and efficient model for various tasks.",
"claude-opus-4-1-20250805-thinking.description": "Claude Opus 4.1 Thinking is an advanced variant that can reveal its reasoning process.",
"claude-opus-4-1-20250805.description": "Claude Opus 4.1 is Anthropic's latest and most capable model for highly complex tasks, excelling in performance, intelligence, fluency, and understanding.",
"claude-opus-4-20250514.description": "Claude Opus 4 is Anthropic's most powerful model for highly complex tasks, excelling in performance, intelligence, fluency, and understanding.",
"claude-opus-4-5-20251101.description": "Claude Opus 4.5 is Anthropics flagship model, combining outstanding intelligence with scalable performance, ideal for complex tasks requiring the highest-quality responses and reasoning.",
"claude-opus-4-6.description": "Claude Opus 4.6 is Anthropic's most intelligent model for building agents and coding.",
"claude-sonnet-4-20250514-thinking.description": "Claude Sonnet 4 Thinking can produce near-instant responses or extended step-by-step thinking with visible process.",
"claude-sonnet-4-20250514.description": "Claude Sonnet 4 is Anthropic's most intelligent model to date, offering near-instant responses or extended step-by-step thinking with fine-grained control for API users.",
"claude-sonnet-4-5-20250929.description": "Claude Sonnet 4.5 is Anthropic's most intelligent model to date.",
"claude-sonnet-4-6.description": "Claude Sonnet 4.6 is Anthropic's best combination of speed and intelligence.",
"claude-sonnet-4.description": "Claude Sonnet 4 is the latest generation with improved performance across all tasks.",
"codegeex-4.description": "CodeGeeX-4 is a powerful AI coding assistant that supports multilingual Q&A and code completion to boost developer productivity.",
"codegeex4-all-9b.description": "CodeGeeX4-ALL-9B is a multilingual code generation model supporting code completion and generation, code interpreter, web search, function calling, and repo-level code Q&A, covering a wide range of software development scenarios. It is a top-tier code model under 10B parameters.",
"codegemma.description": "CodeGemma is a lightweight model for varied programming tasks, enabling fast iteration and integration.",
"codegemma:2b.description": "CodeGemma is a lightweight model for varied programming tasks, enabling fast iteration and integration.",
"codellama.description": "Code Llama is an LLM focused on code generation and discussion, with broad language support for developer workflows.",
"codellama/CodeLlama-34b-Instruct-hf.description": "Code Llama is an LLM focused on code generation and discussion, with broad language support for developer workflows.",
"codellama:13b.description": "Code Llama is an LLM focused on code generation and discussion, with broad language support for developer workflows.",
"codellama:34b.description": "Code Llama is an LLM focused on code generation and discussion, with broad language support for developer workflows.",
"codellama:70b.description": "Code Llama is an LLM focused on code generation and discussion, with broad language support for developer workflows.",
"codeqwen.description": "CodeQwen1.5 is a large language model trained on extensive code data, built for complex programming tasks.",
"codestral-latest.description": "Codestral is our most advanced coding model; v2 (Jan 2025) targets low-latency, high-frequency tasks like FIM, code correction, and test generation.",
"codestral.description": "Codestral is Mistral AIs first code model, delivering strong code generation support.",
"cogito-2.1:671b.description": "Cogito v2.1 671B is a US open-source LLM free for commercial use, with performance rivaling top models, higher token reasoning efficiency, a 128k long context, and strong overall capability.",
"cogview-3-flash.description": "CogView-3-Flash is a free image generation model launched by Zhipu. It generates images that align with user instructions while achieving higher aesthetic quality scores. CogView-3-Flash is primarily used in fields such as artistic creation, design reference, game development, and virtual reality, helping users rapidly convert text descriptions into images.",
"cogview-4.description": "CogView-4 is Zhipus first open-source text-to-image model that can generate Chinese characters. It improves semantic understanding, image quality, and Chinese/English text rendering, supports arbitrary-length bilingual prompts, and can generate images at any resolution within specified ranges.",
"cohere-command-r-plus.description": "Command R+ is an advanced RAG-optimized model built for enterprise workloads.",
"cohere-command-r.description": "Command R is a scalable generative model designed for RAG and tool use, enabling production-grade AI.",
"cohere/Cohere-command-r-plus.description": "Command R+ is an advanced RAG-optimized model built for enterprise workloads.",
"cohere/Cohere-command-r.description": "Command R is a scalable generative model designed for RAG and tool use, enabling production-grade AI.",
"cohere/command-a.description": "Command A is Coheres strongest model yet, excelling at tool use, agents, RAG, and multilingual use cases. It has a 256K context length, runs on just two GPUs, and delivers 150% higher throughput than Command R+ 08-2024.",
"cohere/embed-v4.0.description": "A model that classifies or converts text, images, or mixed content into embeddings.",
"comfyui/flux-dev.description": "FLUX.1 Dev is a high-quality text-to-image model (1050 steps), ideal for premium creative and artistic output.",
"comfyui/flux-kontext-dev.description": "FLUX.1 Kontext-dev is an image editing model that supports text-guided edits, including local edits and style transfer.",
"comfyui/flux-krea-dev.description": "FLUX.1 Krea-dev is a safety-enhanced text-to-image model co-developed with Krea, with built-in safety filters.",
"comfyui/flux-schnell.description": "FLUX.1 Schnell is an ultra-fast text-to-image model that generates high-quality images in 1-4 steps, ideal for real-time use and rapid prototyping.",
"comfyui/stable-diffusion-15.description": "Stable Diffusion 1.5 is a classic 512x512 text-to-image model, ideal for rapid prototyping and creative experiments.",
"comfyui/stable-diffusion-35-inclclip.description": "Stable Diffusion 3.5 with built-in CLIP/T5 encoders needs no external encoder files, suitable for models like sd3.5_medium_incl_clips with lower resource usage.",
"comfyui/stable-diffusion-35.description": "Stable Diffusion 3.5 is a next-generation text-to-image model with Large and Medium variants. It requires external CLIP encoder files and delivers excellent image quality and prompt adherence.",
"comfyui/stable-diffusion-custom-refiner.description": "Custom SDXL image-to-image model. Use custom_sd_lobe.safetensors as the model filename; if you have a VAE, use custom_sd_vae_lobe.safetensors. Place model files in the required Comfy folders.",
"comfyui/stable-diffusion-custom.description": "Custom SD text-to-image model. Use custom_sd_lobe.safetensors as the model filename; if you have a VAE, use custom_sd_vae_lobe.safetensors. Place model files in the required Comfy folders.",
"comfyui/stable-diffusion-refiner.description": "SDXL image-to-image model performs high-quality transformations from input images, supporting style transfer, restoration, and creative variations.",
"comfyui/stable-diffusion-xl.description": "SDXL is a text-to-image model supporting 1024x1024 high-resolution generation with better image quality and detail.",
"command-a-03-2025.description": "Command A is our most capable model to date, excelling at tool use, agents, RAG, and multilingual scenarios. It has a 256K context window, runs on just two GPUs, and delivers 150% higher throughput than Command R+ 08-2024.",
"command-light-nightly.description": "To shorten the gap between major releases, we offer nightly Command builds. For the command-light series this is called command-light-nightly. It is the newest, most experimental (and potentially unstable) version, updated regularly without notice, so it is not recommended for production.",
"command-light.description": "A smaller, faster Command variant that is nearly as capable but faster.",
"command-nightly.description": "To shorten the gap between major releases, we offer nightly Command builds. For the Command series this is called command-nightly. It is the newest, most experimental (and potentially unstable) version, updated regularly without notice, so it is not recommended for production.",
"command-r-03-2024.description": "command-r is an instruction-following chat model that performs language tasks with higher quality, improved reliability, and longer context than previous models. It supports complex workflows such as code generation, RAG, tool use, and agents.",
"command-r-08-2024.description": "command-r-08-2024 is an updated Command R model released in August 2024.",
"command-r-plus-04-2024.description": "command-r-plus is an alias of command-r-plus-04-2024, so using command-r-plus in the API points to that model.",
"command-r-plus-08-2024.description": "Command R+ is an instruction-following chat model with higher quality, greater reliability, and a longer context window than previous models. It is best for complex RAG workflows and multi-step tool use.",
"command-r-plus.description": "Command R+ is a high-performance LLM designed for real enterprise scenarios and complex apps.",
"command-r.description": "Command R is an LLM optimized for chat and long-context tasks, ideal for dynamic interaction and knowledge management.",
"command-r7b-12-2024.description": "command-r7b-12-2024 is a small, efficient update released in December 2024. It excels at RAG, tool use, and agent tasks that require complex, multi-step reasoning.",
"command.description": "An instruction-following chat model that delivers higher quality and reliability on language tasks, with a longer context window than our base generative models.",
"computer-use-preview.description": "computer-use-preview is a specialized model for the \"computer use tool,\" trained to understand and execute computer-related tasks.",
"dall-e-2.description": "Second-generation DALL·E model with more realistic, accurate image generation and 4× the resolution of the first generation.",
"dall-e-3.description": "The latest DALL·E model, released in November 2023, supports more realistic, accurate image generation with stronger detail.",
"databricks/dbrx-instruct.description": "DBRX Instruct offers highly reliable instruction handling across industries.",
"deepseek-ai/DeepSeek-OCR.description": "DeepSeek-OCR is a vision-language model from DeepSeek AI focused on OCR and \"context optical compression.\" It explores compressing context from images, efficiently processes documents, and converts them to structured text (e.g., Markdown). It accurately recognizes text in images, suited for document digitization, text extraction, and structured processing.",
"deepseek-ai/DeepSeek-R1-0528-Qwen3-8B.description": "DeepSeek-R1-0528-Qwen3-8B distills chain-of-thought from DeepSeek-R1-0528 into Qwen3 8B Base. It reaches SOTA among open models, beating Qwen3 8B by 10% on AIME 2024 and matching Qwen3-235B-thinking performance. It excels on math reasoning, programming, and general logic benchmarks. It shares the Qwen3-8B architecture but uses the DeepSeek-R1-0528 tokenizer.",
"deepseek-ai/DeepSeek-R1-0528.description": "DeepSeek R1 leverages additional compute and post-training algorithmic optimizations to deepen reasoning. It performs strongly across benchmarks in math, programming, and general logic, approaching leaders like o3 and Gemini 2.5 Pro.",
"deepseek-ai/DeepSeek-R1-Distill-Llama-70B.description": "DeepSeek-R1 distilled models use RL and cold-start data to improve reasoning and set new open-model multi-task benchmarks.",
"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.description": "DeepSeek-R1 distilled models use RL and cold-start data to improve reasoning and set new open-model multi-task benchmarks.",
"deepseek-ai/DeepSeek-R1-Distill-Qwen-14B.description": "DeepSeek-R1 distilled models use RL and cold-start data to improve reasoning and set new open-model multi-task benchmarks.",
"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B.description": "DeepSeek-R1-Distill-Qwen-32B is distilled from Qwen2.5-32B and fine-tuned on 800K curated DeepSeek-R1 samples. It excels in math, programming, and reasoning, achieving strong results on AIME 2024, MATH-500 (94.3% accuracy), and GPQA Diamond.",
"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.description": "DeepSeek-R1-Distill-Qwen-7B is distilled from Qwen2.5-Math-7B and fine-tuned on 800K curated DeepSeek-R1 samples. It performs strongly, with 92.8% on MATH-500, 55.5% on AIME 2024, and a 1189 CodeForces rating for a 7B model.",
"deepseek-ai/DeepSeek-R1.description": "DeepSeek-R1 improves reasoning with RL and cold-start data, setting new open-model multi-task benchmarks and surpassing OpenAI-o1-mini.",
"deepseek-ai/DeepSeek-V2.5.description": "DeepSeek-V2.5 upgrades DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct, combining general and coding abilities. It improves writing and instruction following for better preference alignment, and shows significant gains on AlpacaEval 2.0, ArenaHard, AlignBench, and MT-Bench.",
"deepseek-ai/DeepSeek-V3.1-Terminus.description": "DeepSeek-V3.1-Terminus is an updated V3.1 model positioned as a hybrid agent LLM. It fixes user-reported issues and improves stability, language consistency, and reduces mixed Chinese/English and abnormal characters. It integrates Thinking and Non-thinking modes with chat templates for flexible switching. It also improves Code Agent and Search Agent performance for more reliable tool use and multi-step tasks.",
"deepseek-ai/DeepSeek-V3.1.description": "DeepSeek V3.1 uses a hybrid reasoning architecture and supports both thinking and non-thinking modes.",
"deepseek-ai/DeepSeek-V3.2-Exp.description": "DeepSeek V3.2 Exp uses a hybrid reasoning architecture and supports both thinking and non-thinking modes.",
"deepseek-ai/DeepSeek-V3.2.description": "DeepSeek-V3.2 is a model that combines high computational efficiency with excellent reasoning and Agent performance. Its approach is based on three major technological breakthroughs: DeepSeek Sparse Attention (DSA), an efficient attention mechanism that significantly reduces computational complexity while maintaining model performance, and is specifically optimized for long-context scenarios; a scalable reinforcement learning framework, through which the model's performance can rival GPT-5, and its high-compute version can rival Gemini-3.0-Pro in reasoning capabilities; and a large-scale Agent task synthesis pipeline, designed to integrate reasoning capabilities into tool-using scenarios, thereby improving instruction-following and generalization abilities in complex interactive environments. The model achieved gold medal results in the 2025 International Mathematical Olympiad (IMO) and International Informatics Olympiad (IOI).",
"deepseek-ai/DeepSeek-V3.description": "DeepSeek-V3 is a 671B-parameter MoE model using MLA and DeepSeekMoE with loss-free load balancing for efficient training and inference. Pretrained on 14.8T high-quality tokens with SFT and RL, it outperforms other open models and approaches leading closed models.",
"deepseek-ai/deepseek-llm-67b-chat.description": "DeepSeek LLM Chat (67B) is an innovative model offering deep language understanding and interaction.",
"deepseek-ai/deepseek-v3.1-terminus.description": "DeepSeek V3.1 is a next-gen reasoning model with stronger complex reasoning and chain-of-thought for deep analysis tasks.",
"deepseek-ai/deepseek-v3.1.description": "DeepSeek V3.1 is a next-gen reasoning model with stronger complex reasoning and chain-of-thought for deep analysis tasks.",
"deepseek-ai/deepseek-v3.2.description": "DeepSeek V3.2 is a next-gen reasoning model with stronger complex reasoning and chain-of-thought capabilities.",
"deepseek-ai/deepseek-vl2.description": "DeepSeek-VL2 is a MoE vision-language model based on DeepSeekMoE-27B with sparse activation, achieving strong performance with only 4.5B active parameters. It excels at visual QA, OCR, document/table/chart understanding, and visual grounding.",
"deepseek-chat.description": "DeepSeek V3.2 balances reasoning and output length for daily QA and agent tasks. Public benchmarks reach GPT-5 levels, and it is the first to integrate thinking into tool use, leading open-source agent evaluations.",
"deepseek-coder-33B-instruct.description": "DeepSeek Coder 33B is a code language model trained on 2T tokens (87% code, 13% Chinese/English text). It introduces a 16K context window and fill-in-the-middle tasks, providing project-level code completion and snippet infilling.",
"deepseek-coder-v2.description": "DeepSeek Coder V2 is an open-source MoE code model that performs strongly on coding tasks, comparable to GPT-4 Turbo.",
"deepseek-coder-v2:236b.description": "DeepSeek Coder V2 is an open-source MoE code model that performs strongly on coding tasks, comparable to GPT-4 Turbo.",
"deepseek-ocr.description": "DeepSeek-OCR is a vision-language model from DeepSeek AI focused on OCR and \"contextual optical compression.\" It explores compressing contextual information from images, efficiently processes documents, and converts them into structured text formats such as Markdown. It accurately recognizes text in images, making it ideal for document digitization, text extraction, and structured processing.",
"deepseek-r1-0528.description": "685B full model released on 2025-05-28. DeepSeek-R1 uses large-scale RL in post-training, greatly improving reasoning with minimal labeled data, and performs strongly on math, coding, and natural language reasoning.",
"deepseek-r1-250528.description": "DeepSeek R1 250528 is the full DeepSeek-R1 reasoning model for hard math and logic tasks.",
"deepseek-r1-70b-fast-online.description": "DeepSeek R1 70B fast edition with real-time web search, delivering quicker responses while maintaining performance.",
"deepseek-r1-70b-online.description": "DeepSeek R1 70B standard edition with real-time web search, suited for up-to-date chat and text tasks.",
"deepseek-r1-distill-llama-70b.description": "DeepSeek R1 Distill Llama 70B combines R1 reasoning with the Llama ecosystem.",
"deepseek-r1-distill-llama-8b.description": "DeepSeek-R1-Distill-Llama-8B is distilled from Llama-3.1-8B using DeepSeek R1 outputs.",
"deepseek-r1-distill-llama.description": "deepseek-r1-distill-llama is distilled from DeepSeek-R1 on Llama.",
"deepseek-r1-distill-qianfan-70b.description": "DeepSeek R1 Distill Qianfan 70B is an R1 distill based on Qianfan-70B with strong value.",
"deepseek-r1-distill-qianfan-8b.description": "DeepSeek R1 Distill Qianfan 8B is an R1 distill based on Qianfan-8B for small and mid-sized apps.",
"deepseek-r1-distill-qianfan-llama-70b.description": "DeepSeek R1 Distill Qianfan Llama 70B is an R1 distill based on Llama-70B.",
"deepseek-r1-distill-qwen-1.5b.description": "DeepSeek R1 Distill Qwen 1.5B is an ultra-light distill model for very low-resource environments.",
"deepseek-r1-distill-qwen-14b.description": "DeepSeek R1 Distill Qwen 14B is a mid-size distill model for multi-scenario deployment.",
"deepseek-r1-distill-qwen-32b.description": "DeepSeek R1 Distill Qwen 32B is an R1 distill based on Qwen-32B, balancing performance and cost.",
"deepseek-r1-distill-qwen-7b.description": "DeepSeek R1 Distill Qwen 7B is a lightweight distill model for edge and private enterprise environments.",
"deepseek-r1-distill-qwen.description": "deepseek-r1-distill-qwen is distilled from DeepSeek-R1 on Qwen.",
"deepseek-r1-fast-online.description": "DeepSeek R1 fast full version with real-time web search, combining 671B-scale capability and faster response.",
"deepseek-r1-online.description": "DeepSeek R1 full version with 671B parameters and real-time web search, offering stronger understanding and generation.",
"deepseek-r1.description": "DeepSeek-R1 uses cold-start data before RL and performs comparably to OpenAI-o1 on math, coding, and reasoning.",
"deepseek-reasoner.description": "DeepSeek V3.2 Thinking is a deep reasoning model that generates chain-of-thought before outputs for higher accuracy, with top competition results and reasoning comparable to Gemini-3.0-Pro.",
"deepseek-v2.description": "DeepSeek V2 is an efficient MoE model for cost-effective processing.",
"deepseek-v2:236b.description": "DeepSeek V2 236B is DeepSeeks code-focused model with strong code generation.",
"deepseek-v3-0324.description": "DeepSeek-V3-0324 is a 671B-parameter MoE model with standout strengths in programming and technical capability, context understanding, and long-text handling.",
"deepseek-v3.1-terminus.description": "DeepSeek-V3.1-Terminus is a terminal-optimized LLM from DeepSeek, tailored for terminal devices.",
"deepseek-v3.1-think-250821.description": "DeepSeek V3.1 Think 250821 is the deep-thinking model corresponding to the Terminus version, built for high-performance reasoning.",
"deepseek-v3.1.description": "DeepSeek-V3.1 is a new hybrid reasoning model from DeepSeek, supporting both thinking and non-thinking modes and offering higher thinking efficiency than DeepSeek-R1-0528. Post-training optimizations greatly improve agent tool use and agent-task performance. It supports a 128k context window and up to 64k output tokens.",
"deepseek-v3.1:671b.description": "DeepSeek V3.1 is a next-generation reasoning model with improved complex reasoning and chain-of-thought, suited for tasks requiring deep analysis.",
"deepseek-v3.2-exp.description": "deepseek-v3.2-exp introduces sparse attention to improve training and inference efficiency on long text, at a lower price than deepseek-v3.1.",
"deepseek-v3.2-speciale.description": "On highly complex tasks, the Speciale model significantly outperforms the standard version, but it consumes considerably more tokens and incurs higher costs. Currently, DeepSeek-V3.2-Speciale is intended for research use only, does not support tool calls, and has not been specifically optimized for everyday conversation or writing tasks.",
"deepseek-v3.2-think.description": "DeepSeek V3.2 Think is a full deep-thinking model with stronger long-chain reasoning.",
"deepseek-v3.2.description": "DeepSeek-V3.2 is the first hybrid reasoning model from DeepSeek that integrates thinking into tool usage. It uses efficient architecture to save computation, large-scale reinforcement learning to enhance capabilities, and large-scale synthetic task data to strengthen generalization. The combination of these three achieves performance comparable to GPT-5-High, with significantly reduced output length, notably decreasing computational overhead and user wait times.",
"deepseek-v3.description": "DeepSeek-V3 is a powerful MoE model with 671B total parameters and 37B active per token.",
"deepseek-vl2-small.description": "DeepSeek VL2 Small is a lightweight multimodal version for resource-constrained and high-concurrency use.",
"deepseek-vl2.description": "DeepSeek VL2 is a multimodal model for image-text understanding and fine-grained visual QA.",
"deepseek/deepseek-chat-v3-0324.description": "DeepSeek V3 is a 685B-parameter MoE model and the latest iteration of DeepSeeks flagship chat series.\n\nIt builds on [DeepSeek V3](/deepseek/deepseek-chat-v3) and performs strongly across tasks.",
"deepseek/deepseek-chat-v3-0324:free.description": "DeepSeek V3 is a 685B-parameter MoE model and the latest iteration of DeepSeeks flagship chat series.\n\nIt builds on [DeepSeek V3](/deepseek/deepseek-chat-v3) and performs strongly across tasks.",
"deepseek/deepseek-chat-v3.1.description": "DeepSeek-V3.1 is DeepSeeks long-context hybrid reasoning model, supporting mixed thinking/non-thinking modes and tool integration.",
"deepseek/deepseek-chat.description": "DeepSeek-V3 is DeepSeeks high-performance hybrid reasoning model for complex tasks and tool integration.",
"deepseek/deepseek-math-v2.description": "DeepSeek Math V2 is a model that has made significant breakthroughs in mathematical reasoning capabilities. Its core innovation lies in the \"self-verification\" training mechanism, and it has achieved gold medal levels in several top mathematics competitions.",
"deepseek/deepseek-r1-0528.description": "DeepSeek R1 0528 is an updated variant focused on open availability and deeper reasoning.",
"deepseek/deepseek-r1-0528:free.description": "DeepSeek-R1 greatly improves reasoning with minimal labeled data and outputs a chain-of-thought before the final answer to improve accuracy.",
"deepseek/deepseek-r1-distill-llama-70b.description": "DeepSeek R1 Distill Llama 70B is a distilled LLM based on Llama 3.3 70B, fine-tuned using DeepSeek R1 outputs to achieve competitive performance with large frontier models.",
"deepseek/deepseek-r1-distill-llama-8b.description": "DeepSeek R1 Distill Llama 8B is a distilled LLM based on Llama-3.1-8B-Instruct, trained using DeepSeek R1 outputs.",
"deepseek/deepseek-r1-distill-qwen-14b.description": "DeepSeek R1 Distill Qwen 14B is a distilled LLM based on Qwen 2.5 14B, trained using DeepSeek R1 outputs. It surpasses OpenAI o1-mini on multiple benchmarks, achieving state-of-the-art results among dense models. Benchmark highlights:\nAIME 2024 pass@1: 69.7\nMATH-500 pass@1: 93.9\nCodeForces Rating: 1481\nFine-tuning on DeepSeek R1 outputs delivers competitive performance with larger frontier models.",
"deepseek/deepseek-r1-distill-qwen-32b.description": "DeepSeek R1 Distill Qwen 32B is a distilled LLM based on Qwen 2.5 32B, trained using DeepSeek R1 outputs. It surpasses OpenAI o1-mini on multiple benchmarks, achieving state-of-the-art results among dense models. Benchmark highlights:\nAIME 2024 pass@1: 72.6\nMATH-500 pass@1: 94.3\nCodeForces Rating: 1691\nFine-tuning on DeepSeek R1 outputs delivers competitive performance with larger frontier models.",
"deepseek/deepseek-r1.description": "DeepSeek R1 has been updated to DeepSeek-R1-0528. With more compute and post-training algorithmic optimizations, it significantly improves reasoning depth and capability. It performs strongly across math, programming, and general logic benchmarks, approaching leaders like o3 and Gemini 2.5 Pro.",
"deepseek/deepseek-r1/community.description": "DeepSeek R1 is the latest open-source model released by the DeepSeek team, with very strong reasoning performance, especially in math, coding, and reasoning tasks, comparable to OpenAI o1.",
"deepseek/deepseek-r1:free.description": "DeepSeek-R1 greatly improves reasoning with minimal labeled data and outputs a chain-of-thought before the final answer to improve accuracy.",
"deepseek/deepseek-reasoner.description": "DeepSeek-V3 Thinking (reasoner) is DeepSeeks experimental reasoning model, suitable for high-complexity reasoning tasks.",
"deepseek/deepseek-v3.description": "A fast general-purpose LLM with enhanced reasoning.",
"deepseek/deepseek-v3/community.description": "DeepSeek-V3 delivers a major breakthrough in reasoning speed over previous models. It ranks first among open-source models and rivals the most advanced closed models. DeepSeek-V3 adopts Multi-Head Latent Attention (MLA) and the DeepSeekMoE architecture, both fully validated in DeepSeek-V2. It also introduces a lossless auxiliary strategy for load balancing and a multi-token prediction training objective for stronger performance.",
"deepseek_r1.description": "DeepSeek-R1 is a reinforcement-learning-driven reasoning model that addresses repetition and readability issues. Before RL, it uses cold-start data to further improve reasoning performance. It matches OpenAI-o1 on math, coding, and reasoning tasks, with carefully designed training improving overall results.",
"deepseek_r1_distill_llama_70b.description": "DeepSeek-R1-Distill-Llama-70B is distilled from Llama-3.3-70B-Instruct. As part of the DeepSeek-R1 series, it is fine-tuned on DeepSeek-R1-generated samples and performs strongly in math, coding, and reasoning.",
"deepseek_r1_distill_qwen_14b.description": "DeepSeek-R1-Distill-Qwen-14B is distilled from Qwen2.5-14B and fine-tuned on 800K curated samples generated by DeepSeek-R1, delivering strong reasoning.",
"deepseek_r1_distill_qwen_32b.description": "DeepSeek-R1-Distill-Qwen-32B is distilled from Qwen2.5-32B and fine-tuned on 800K curated samples generated by DeepSeek-R1, excelling in math, coding, and reasoning.",
"devstral-2512.description": "Devstral 2 is an enterprise-level text model that excels at using tools to explore codebases, edit multiple files, and power software engineering agents.",
"devstral-2:123b.description": "Devstral 2 123B excels at using tools to explore codebases, edit multiple files, and support software engineering agents.",
"doubao-1.5-lite-32k.description": "Doubao-1.5-lite is a new lightweight model with ultra-fast response, delivering top-tier quality and latency.",
"doubao-1.5-pro-256k.description": "Doubao-1.5-pro-256k is a comprehensive upgrade to Doubao-1.5-Pro, improving overall performance by 10%. It supports a 256k context window and up to 12k output tokens, delivering higher performance, a larger window, and strong value for broader use cases.",
"doubao-1.5-pro-32k.description": "Doubao-1.5-pro is a new-generation flagship model with across-the-board upgrades, excelling in knowledge, coding, and reasoning.",
"doubao-1.5-thinking-pro-m.description": "Doubao-1.5 is a new deep-reasoning model (the m version includes native multimodal deep reasoning) that excels in math, coding, scientific reasoning, and general tasks like creative writing. It reaches or approaches top-tier results on benchmarks such as AIME 2024, Codeforces, and GPQA. It supports a 128k context window and 16k output.",
"doubao-1.5-thinking-pro.description": "Doubao-1.5 is a new deep-reasoning model that excels in math, coding, scientific reasoning, and general tasks like creative writing. It reaches or approaches top-tier results on benchmarks such as AIME 2024, Codeforces, and GPQA. It supports a 128k context window and 16k output.",
"doubao-1.5-thinking-vision-pro.description": "A new visual deep-reasoning model with stronger multimodal understanding and reasoning, achieving SOTA results on 37 of 59 public benchmarks.",
"doubao-1.5-ui-tars.description": "Doubao-1.5-UI-TARS is a native GUI-focused agent model that seamlessly interacts with interfaces through human-like perception, reasoning, and action.",
"doubao-1.5-vision-lite.description": "Doubao-1.5-vision-lite is an upgraded multimodal model that supports images at any resolution and extreme aspect ratios, enhancing visual reasoning, document recognition, detail understanding, and instruction following. It supports a 128k context window and up to 16k output tokens.",
"doubao-1.5-vision-pro-32k.description": "Doubao-1.5-vision-pro is an upgraded multimodal model that supports images at any resolution and extreme aspect ratios, enhancing visual reasoning, document recognition, detail understanding, and instruction following.",
"doubao-1.5-vision-pro.description": "Doubao-1.5-vision-pro is an upgraded multimodal model that supports images at any resolution and extreme aspect ratios, enhancing visual reasoning, document recognition, detail understanding, and instruction following.",
"doubao-lite-32k.description": "Ultra-fast response with better value, offering more flexible choices across scenarios. Supports reasoning and fine-tuning with a 32k context window.",
"doubao-pro-32k.description": "The best-performing flagship model for complex tasks, with strong results in reference QA, summarization, creation, text classification, and roleplay. Supports reasoning and fine-tuning with a 32k context window.",
"doubao-seed-1.6-flash.description": "Doubao-Seed-1.6-flash is an ultra-fast multimodal deep-reasoning model with TPOT as low as 10ms. It supports both text and vision, surpasses the previous lite model in text understanding, and matches competing pro models in vision. It supports a 256k context window and up to 16k output tokens.",
"doubao-seed-1.6-lite.description": "Doubao-Seed-1.6-lite is a new multimodal deep-reasoning model with adjustable reasoning effort (Minimal, Low, Medium, High), delivering better value and a strong choice for common tasks, with a context window up to 256k.",
"doubao-seed-1.6-thinking.description": "Doubao-Seed-1.6-thinking significantly strengthens reasoning, further improving core abilities in coding, math, and logical reasoning over Doubao-1.5-thinking-pro, while adding vision understanding. It supports a 256k context window and up to 16k output tokens.",
"doubao-seed-1.6-vision.description": "Doubao-Seed-1.6-vision is a visual deep-reasoning model that delivers stronger multimodal understanding and reasoning for education, image review, inspection/security, and AI search Q&A. It supports a 256k context window and up to 64k output tokens.",
"doubao-seed-1.6.description": "Doubao-Seed-1.6 is a new multimodal deep-reasoning model with auto, thinking, and non-thinking modes. In non-thinking mode, it significantly outperforms Doubao-1.5-pro/250115. It supports a 256k context window and up to 16k output tokens.",
"doubao-seed-1.8.description": "Doubao-Seed-1.8 has stronger multimodal understanding and Agent capabilities, supports text/image/video input and context caching, and can deliver excellent performance in complex tasks.",
"doubao-seed-2.0-code.description": "Doubao-Seed-2.0-code is deeply optimized for agentic coding, supports multimodal inputs and a 256k context window, fitting coding, vision understanding, and agent workflows.",
"doubao-seed-2.0-lite.description": "Doubao-Seed-2.0-lite is a new multimodal deep-reasoning model that delivers better value and a strong choice for common tasks, with a context window up to 256k.",
"doubao-seed-2.0-mini.description": "Doubao-Seed-2.0-mini is a lightweight model with fast response and high performance, suitable for small tasks and high-concurrency scenarios.",
"doubao-seed-2.0-pro.description": "Doubao-Seed-2.0-pro is ByteDance's flagship Agent general model, with all-around leaps in complex task planning and execution capabilities.",
"doubao-seed-code.description": "Doubao-Seed-Code is deeply optimized for agentic coding, supports multimodal inputs (text/image/video) and a 256k context window, is compatible with the Anthropic API, and fits coding, vision understanding, and agent workflows.",
"doubao-seedance-1-0-lite-i2v-250428.description": "Stable generation quality with high cost-effectiveness, capable of generating videos from a first frame, first-and-last frames, or reference images.",
"doubao-seedance-1-0-lite-t2v-250428.description": "Stable generation quality with high cost-effectiveness, capable of generating videos based on text instructions.",
"doubao-seedance-1-0-pro-250528.description": "Seedance 1.0 Pro is a video generation foundation model that supports multi-shot storytelling. It delivers strong performance across multiple dimensions. The model achieves breakthroughs in semantic understanding and instruction following, enabling it to generate 1080P high-definition videos with smooth motion, rich details, diverse styles, and cinematic-level visual aesthetics.",
"doubao-seedance-1-0-pro-fast-251015.description": "Seedance 1.0 Pro Fast is a comprehensive model designed to minimize cost while maximizing performance, achieving an excellent balance between video generation quality, speed, and price. It inherits the core strengths of Seedance 1.0 Pro, while offering faster generation speeds and more competitive pricing, delivering creators a dual optimization of efficiency and cost.",
"doubao-seedance-1-5-pro-251215.description": "Seedance 1.5 Pro by ByteDance supports text-to-video, image-to-video (first frame, first+last frame), and audio generation synchronized with visuals.",
"doubao-seededit-3-0-i2i-250628.description": "The Doubao image model from ByteDance Seed supports text and image inputs with highly controllable, high-quality image generation. It supports text-guided image editing, with output sizes between 512 and 1536 on the long side.",
"doubao-seedream-3-0-t2i-250415.description": "Seedream 3.0 is an image generation model from ByteDance Seed, supporting text and image inputs with highly controllable, high-quality image generation. It generates images from text prompts.",
"doubao-seedream-4-0-250828.description": "Seedream 4.0 is an image generation model from ByteDance Seed, supporting text and image inputs with highly controllable, high-quality image generation. It generates images from text prompts.",
"doubao-seedream-4-5-251128.description": "Seedream 4.5 is ByteDances latest multimodal image model, integrating text-to-image, image-to-image, and batch image generation capabilities, while incorporating commonsense and reasoning abilities. Compared to the previous 4.0 version, it delivers significantly improved generation quality, with better editing consistency and multi-image fusion. It offers more precise control over visual details, producing small text and small faces more naturally, and achieves more harmonious layout and color, enhancing overall aesthetics.",
"doubao-seedream-5-0-260128.description": "Doubao-Seedream-5.0-lite is ByteDances latest image-generation model. For the first time, it integrates online retrieval capabilities, allowing it to incorporate real-time web information and enhance the timeliness of generated images. The models intelligence has also been upgraded, enabling precise interpretation of complex instructions and visual content. Additionally, it offers improved global knowledge coverage, reference consistency, and generation quality in professional scenarios, better meeting enterprise-level visual creation needs.",
"emohaa.description": "Emohaa is a mental health model with professional counseling abilities to help users understand emotional issues.",
"ernie-4.5-0.3b.description": "ERNIE 4.5 0.3B is an open-source lightweight model for local and customized deployment.",
"ernie-4.5-21b-a3b-thinking.description": "ERNIE-4.5-21B-A3B-Thinking is a text MoE (Mixture-of-Experts) post-trained model with a total of 21B parameters and 3B active parameters, offering significantly enhanced reasoning quality and depth.",
"ernie-4.5-21b-a3b.description": "ERNIE 4.5 21B A3B is an open-source large-parameter model with stronger understanding and generation.",
"ernie-4.5-300b-a47b.description": "ERNIE 4.5 300B A47B is Baidu ERNIEs ultra-large MoE model with excellent reasoning.",
"ernie-4.5-8k-preview.description": "ERNIE 4.5 8K Preview is an 8K context preview model for evaluating ERNIE 4.5.",
"ernie-4.5-turbo-128k-preview.description": "ERNIE 4.5 Turbo 128K preview with release-level capabilities, suitable for integration and canary testing.",
"ernie-4.5-turbo-128k.description": "ERNIE 4.5 Turbo 128K is a high-performance general model with search augmentation and tool calling for QA, coding, and agent scenarios.",
"ernie-4.5-turbo-32k.description": "ERNIE 4.5 Turbo 32K is a mid-length context version for QA, knowledge base retrieval, and multi-turn dialogue.",
"ernie-4.5-turbo-latest.description": "Latest ERNIE 4.5 Turbo with optimized overall performance, ideal as the primary production model.",
"ernie-4.5-turbo-vl-32k-preview.description": "ERNIE 4.5 Turbo VL 32K Preview is a 32K multimodal preview for evaluating long-context vision ability.",
"ernie-4.5-turbo-vl-32k.description": "ERNIE 4.5 Turbo VL 32K is a mid-long multimodal version for combined long-doc and image understanding.",
"ernie-4.5-turbo-vl-latest.description": "ERNIE 4.5 Turbo VL Latest is the newest multimodal version with improved image-text understanding and reasoning.",
"ernie-4.5-turbo-vl-preview.description": "ERNIE 4.5 Turbo VL Preview is a multimodal preview model for image-text understanding and generation, suitable for visual QA and content comprehension.",
"ernie-4.5-turbo-vl.description": "ERNIE 4.5 Turbo VL is a mature multimodal model for production image-text understanding and recognition.",
"ernie-4.5-vl-28b-a3b.description": "ERNIE 4.5 VL 28B A3B is an open-source multimodal model for image-text understanding and reasoning.",
"ernie-5.0-thinking-latest.description": "Wenxin 5.0 Thinking is a native full-modal flagship model with unified text, image, audio, and video modeling. It delivers broad capability upgrades for complex QA, creation, and agent scenarios.",
"ernie-5.0-thinking-preview.description": "Wenxin 5.0 Thinking Preview is a native full-modal flagship model with unified text, image, audio, and video modeling. It delivers broad capability upgrades for complex QA, creation, and agent scenarios.",
"ernie-char-8k.description": "ERNIE Character 8K is a persona dialogue model for IP character building and long-term companionship chat.",
"ernie-char-fiction-8k-preview.description": "ERNIE Character Fiction 8K Preview is a character and plot creation model preview for feature evaluation and testing.",
"ernie-char-fiction-8k.description": "ERNIE Character Fiction 8K is a persona model for novels and plot creation, suited for long-form story generation.",
"ernie-irag-edit.description": "ERNIE iRAG Edit is an image editing model supporting erasing, repainting, and variant generation.",
"ernie-lite-pro-128k.description": "ERNIE Lite Pro 128K is a lightweight high-performance model for latency- and cost-sensitive scenarios.",
"ernie-novel-8k.description": "ERNIE Novel 8K is built for long-form novels and IP plots with multi-character narratives.",
"ernie-speed-pro-128k.description": "ERNIE Speed Pro 128K is a high-concurrency, high-value model for large-scale online services and enterprise apps.",
"ernie-x1-turbo-32k-preview.description": "ERNIE X1 Turbo 32K Preview is a fast thinking model with 32K context for complex reasoning and multi-turn chat.",
"ernie-x1-turbo-32k.description": "ERNIE X1 Turbo 32K is a fast thinking model with 32K context for complex reasoning and multi-turn chat.",
"ernie-x1.1-preview.description": "ERNIE X1.1 Preview is a thinking-model preview for evaluation and testing.",
"ernie-x1.1.description": "ERNIE X1.1 is a thinking-model preview for evaluation and testing.",
"fal-ai/bytedance/seedream/v4.5.description": "Seedream 4.5, built by ByteDance Seed team, supports multi-image editing and composition. Features enhanced subject consistency, precise instruction following, spatial logic understanding, aesthetic expression, poster layout and logo design with high-precision text-image rendering.",
"fal-ai/bytedance/seedream/v4.description": "Seedream 4.0, built by ByteDance Seed, supports text and image inputs for highly controllable, high-quality image generation from prompts.",
"fal-ai/flux-kontext/dev.description": "FLUX.1 model focused on image editing, supporting text and image inputs.",
"fal-ai/flux-pro/kontext.description": "FLUX.1 Kontext [pro] accepts text and reference images as input, enabling targeted local edits and complex global scene transformations.",
"fal-ai/flux/krea.description": "Flux Krea [dev] is an image generation model with an aesthetic bias toward more realistic, natural images.",
"fal-ai/flux/schnell.description": "FLUX.1 [schnell] is a 12B-parameter image generation model built for fast, high-quality output.",
"fal-ai/hunyuan-image/v3.description": "A powerful native multimodal image generation model.",
"fal-ai/imagen4/preview.description": "High-quality image generation model from Google.",
"fal-ai/nano-banana.description": "Nano Banana is Googles newest, fastest, and most efficient native multimodal model, enabling image generation and editing through conversation.",
"fal-ai/qwen-image-edit.description": "A professional image editing model from the Qwen team, supporting semantic and appearance edits, precise Chinese/English text editing, style transfer, rotation, and more.",
"fal-ai/qwen-image.description": "A powerful image generation model from the Qwen team with strong Chinese text rendering and diverse visual styles.",
"flux-1-schnell.description": "A 12B-parameter text-to-image model from Black Forest Labs using latent adversarial diffusion distillation to generate high-quality images in 1-4 steps. It rivals closed alternatives and is released under Apache-2.0 for personal, research, and commercial use.",
"flux-dev.description": "FLUX.1 [dev] is an open-weights distilled model for non-commercial use. It keeps near-pro image quality and instruction following while running more efficiently, using resources better than same-size standard models.",
"flux-kontext-max.description": "State-of-the-art contextual image generation and editing, combining text and images for precise, coherent results.",
"flux-kontext-pro.description": "State-of-the-art contextual image generation and editing, combining text and images for precise, coherent results.",
"flux-merged.description": "FLUX.1 [merged] combines the deep features explored in \"DEV\" with the high-speed advantages of \"Schnell\", extending performance limits and broadening applications.",
"flux-pro-1.1-ultra.description": "Ultra-high-resolution image generation with 4MP output, producing crisp images in 10 seconds.",
"flux-pro-1.1.description": "Upgraded professional-grade image generation model with excellent image quality and precise prompt adherence.",
"flux-pro.description": "Top-tier commercial image generation model with unmatched image quality and diverse outputs.",
"flux-schnell.description": "FLUX.1 [schnell] is the most advanced open-source few-step model, surpassing similar competitors and even strong non-distilled models like Midjourney v6.0 and DALL-E 3 (HD). It is finely tuned to preserve pretraining diversity, significantly improving visual quality, instruction following, size/aspect variation, font handling, and output diversity.",
"flux.1-schnell.description": "FLUX.1-schnell is a high-performance image generation model for fast multi-style outputs.",
"gemini-1.0-pro-001.description": "Gemini 1.0 Pro 001 (Tuning) provides stable, tunable performance for complex tasks.",
"gemini-1.0-pro-002.description": "Gemini 1.0 Pro 002 (Tuning) provides strong multimodal support for complex tasks.",
"gemini-1.0-pro-latest.description": "Gemini 1.0 Pro is Googles high-performance AI model designed for broad task scaling.",
"gemini-1.5-flash-001.description": "Gemini 1.5 Flash 001 is an efficient multimodal model for broad application scaling.",
"gemini-1.5-flash-002.description": "Gemini 1.5 Flash 002 is an efficient multimodal model built for broad deployment.",
"gemini-1.5-flash-8b-exp-0924.description": "Gemini 1.5 Flash 8B 0924 is the latest experimental model with notable gains across text and multimodal use cases.",
"gemini-1.5-flash-8b-latest.description": "Gemini 1.5 Flash 8B is an efficient multimodal model built for broad deployment.",
"gemini-1.5-flash-8b.description": "Gemini 1.5 Flash 8B is an efficient multimodal model for broad application scaling.",
"gemini-1.5-flash-exp-0827.description": "Gemini 1.5 Flash 0827 delivers optimized multimodal processing for complex tasks.",
"gemini-1.5-flash-latest.description": "Gemini 1.5 Flash is Googles latest multimodal AI model with fast processing, supporting text, image, and video inputs for efficient scaling across tasks.",
"gemini-1.5-pro-001.description": "Gemini 1.5 Pro 001 is a scalable multimodal AI solution for complex tasks.",
"gemini-1.5-pro-002.description": "Gemini 1.5 Pro 002 is the latest production-ready model with higher-quality output, especially for math, long context, and vision tasks.",
"gemini-1.5-pro-exp-0801.description": "Gemini 1.5 Pro 0801 provides strong multimodal processing with greater flexibility for app development.",
"gemini-1.5-pro-exp-0827.description": "Gemini 1.5 Pro 0827 applies latest optimizations for more efficient multimodal processing.",
"gemini-1.5-pro-latest.description": "Gemini 1.5 Pro supports up to 2 million tokens, an ideal mid-sized multimodal model for complex tasks.",
"gemini-2.0-flash-001.description": "Gemini 2.0 Flash delivers next-gen features including exceptional speed, native tool use, multimodal generation, and a 1M-token context window.",
"gemini-2.0-flash-exp-image-generation.description": "Gemini 2.0 Flash experimental model with image generation support.",
"gemini-2.0-flash-lite-001.description": "A Gemini 2.0 Flash variant optimized for cost efficiency and low latency.",
"gemini-2.0-flash-lite.description": "A Gemini 2.0 Flash variant optimized for cost efficiency and low latency.",
"gemini-2.0-flash.description": "Gemini 2.0 Flash delivers next-gen features including exceptional speed, native tool use, multimodal generation, and a 1M-token context window.",
"gemini-2.5-flash-image.description": "Nano Banana is Googles newest, fastest, and most efficient native multimodal model, enabling conversational image generation and editing.",
"gemini-2.5-flash-image:image.description": "Nano Banana is Googles newest, fastest, and most efficient native multimodal model, enabling conversational image generation and editing.",
"gemini-2.5-flash-lite-preview-06-17.description": "Gemini 2.5 Flash-Lite Preview is Googles smallest, best-value model, designed for large-scale use.",
"gemini-2.5-flash-lite-preview-09-2025.description": "Preview release (September 25th, 2025) of Gemini 2.5 Flash-Lite",
"gemini-2.5-flash-lite.description": "Gemini 2.5 Flash-Lite is Googles smallest, best-value model, designed for large-scale use.",
"gemini-2.5-flash-preview-04-17.description": "Gemini 2.5 Flash Preview is Googles best-value model with full capabilities.",
"gemini-2.5-flash.description": "Gemini 2.5 Flash is Googles best-value model with full capabilities.",
"gemini-2.5-pro-preview-03-25.description": "Gemini 2.5 Pro Preview is Googles most advanced reasoning model, able to reason over code, math, and STEM problems and analyze large datasets, codebases, and documents with long context.",
"gemini-2.5-pro-preview-05-06.description": "Gemini 2.5 Pro Preview is Googles most advanced reasoning model, able to reason over code, math, and STEM problems and analyze large datasets, codebases, and documents with long context.",
"gemini-2.5-pro-preview-06-05.description": "Gemini 2.5 Pro Preview is Googles most advanced reasoning model, able to reason over code, math, and STEM problems and analyze large datasets, codebases, and documents with long context.",
"gemini-2.5-pro.description": "Gemini 2.5 Pro is Googles most advanced reasoning model, able to reason over code, math, and STEM problems and analyze large datasets, codebases, and documents with long context.",
"gemini-3-flash-preview.description": "Gemini 3 Flash is the smartest model built for speed, combining cutting-edge intelligence with excellent search grounding.",
"gemini-3-pro-image-preview.description": "Gemini 3 Pro Image (Nano Banana Pro) is Google's image generation model that also supports multimodal dialogue.",
"gemini-3-pro-image-preview:image.description": "Gemini 3 Pro Image (Nano Banana Pro) is Google's image generation model and also supports multimodal chat.",
"gemini-3-pro-preview.description": "Gemini 3 Pro is Googles most powerful agent and vibe-coding model, delivering richer visuals and deeper interaction on top of state-of-the-art reasoning.",
"gemini-3.1-flash-image-preview.description": "Gemini 3.1 Flash Image (Nano Banana 2) is Google's fastest native image generation model with thinking support, conversational image generation and editing.",
"gemini-3.1-flash-image-preview:image.description": "Gemini 3.1 Flash Image (Nano Banana 2) delivers Pro-level image quality at Flash speed with multimodal chat support.",
"gemini-3.1-flash-lite-preview.description": "Gemini 3.1 Flash-Lite Preview is Google's most cost-efficient multimodal model, optimized for high-volume agentic tasks, translation, and data processing.",
"gemini-3.1-pro-preview.description": "Gemini 3.1 Pro Preview improves on Gemini 3 Pro with enhanced reasoning capabilities and adds medium thinking level support.",
"gemini-flash-latest.description": "Latest release of Gemini Flash",
"gemini-flash-lite-latest.description": "Latest release of Gemini Flash-Lite",
"gemini-pro-latest.description": "Latest release of Gemini Pro",
"gemma-7b-it.description": "Gemma 7B is cost-effective for small to mid-scale tasks.",
"gemma2-9b-it.description": "Gemma 2 9B is optimized for specific tasks and tool integration.",
"gemma2.description": "Gemma 2 is Googles efficient model, covering use cases from small apps to complex data processing.",
"gemma2:27b.description": "Gemma 2 is Googles efficient model, covering use cases from small apps to complex data processing.",
"gemma2:2b.description": "Gemma 2 is Googles efficient model, covering use cases from small apps to complex data processing.",
"generalv3.5.description": "Spark Max is the most full-featured version, supporting web search and many built-in plugins. Its fully optimized core capabilities, system roles, and function calling deliver excellent performance across complex application scenarios.",
"generalv3.description": "Spark Pro is a high-performance LLM optimized for professional domains, focusing on math, programming, healthcare, and education, with web search and built-in plugins such as weather and date. It delivers strong performance and efficiency in complex knowledge Q&A, language understanding, and advanced text creation, making it an ideal choice for professional use cases.",
"glm-4-0520.description": "GLM-4-0520 is the latest model version, designed for highly complex and diverse tasks with excellent performance.",
"glm-4-7.description": "GLM-4.7 is the latest flagship model from Zhipu AI. GLM-4.7 enhances coding capabilities, long-term task planning, and tool collaboration for Agentic Coding scenarios, achieving leading performance among open-source models in multiple public benchmarks. General capabilities are improved, with more concise and natural responses, and more immersive writing. In complex agent tasks, instruction following is stronger during tool calls, and the aesthetics of Artifacts and Agentic Coding frontend, as well as long-term task completion efficiency, are further enhanced. • Stronger programming capabilities: Significantly improved multi-language coding and terminal agent performance; GLM-4.7 can now implement \"think first, then act\" mechanisms in programming frameworks like Claude Code, Kilo Code, TRAE, Cline, and Roo Code, with more stable performance on complex tasks. • Frontend aesthetics improvement: GLM-4.7 shows significant progress in frontend generation quality, capable of generating websites, PPTs, and posters with better visual appeal. • Stronger tool calling capabilities: GLM-4.7 enhances tool calling abilities, scoring 67 in BrowseComp web task evaluation; achieving 84.7 in τ²-Bench interactive tool calling evaluation, surpassing Claude Sonnet 4.5 as the open-source SOTA. • Reasoning capability improvement: Significantly enhanced math and reasoning abilities, scoring 42.8% in the HLE (\"Humanity's Last Exam\") benchmark, a 41% improvement over GLM-4.6, surpassing GPT-5.1. • General capability enhancement: GLM-4.7 conversations are more concise, intelligent, and humane; writing and role-playing are more literary and immersive.",
"glm-4-9b-chat.description": "GLM-4-9B-Chat performs strongly across semantics, math, reasoning, code, and knowledge. It also supports web browsing, code execution, custom tool calling, and long-text reasoning, with support for 26 languages including Japanese, Korean, and German.",
"glm-4-air-250414.description": "GLM-4-Air is a high-value option with performance close to GLM-4, fast speed, and lower cost.",
"glm-4-air.description": "GLM-4-Air is a high-value option with performance close to GLM-4, fast speed, and lower cost.",
"glm-4-airx.description": "GLM-4-AirX is a more efficient GLM-4-Air variant with up to 2.6x faster reasoning.",
"glm-4-alltools.description": "GLM-4-AllTools is a versatile agent model optimized for complex instruction planning and tool use such as web browsing, code explanation, and text generation, suitable for multi-task execution.",
"glm-4-flash-250414.description": "GLM-4-Flash is ideal for simple tasks: fastest and free.",
"glm-4-flash.description": "GLM-4-Flash is ideal for simple tasks: fastest and free.",
"glm-4-flashx.description": "GLM-4-FlashX is an enhanced Flash version with ultra-fast reasoning.",
"glm-4-long.description": "GLM-4-Long supports ultra-long inputs for memory-style tasks and large-scale document processing.",
"glm-4-plus.description": "GLM-4-Plus is a high-intelligence flagship with strong long-text and complex-task handling and upgraded overall performance.",
"glm-4.1v-thinking-flash.description": "GLM-4.1V-Thinking is the strongest known ~10B VLM, covering SOTA tasks like video understanding, image QA, subject solving, OCR, document and chart reading, GUI agents, frontend coding, and grounding. It even surpasses the 8x larger Qwen2.5-VL-72B on many tasks. With advanced RL, it uses chain-of-thought reasoning to improve accuracy and richness, outperforming traditional non-thinking models in both outcomes and explainability.",
"glm-4.1v-thinking-flashx.description": "GLM-4.1V-Thinking is the strongest known ~10B VLM, covering SOTA tasks like video understanding, image QA, subject solving, OCR, document and chart reading, GUI agents, frontend coding, and grounding. It even surpasses the 8x larger Qwen2.5-VL-72B on many tasks. With advanced RL, it uses chain-of-thought reasoning to improve accuracy and richness, outperforming traditional non-thinking models in both outcomes and explainability.",
"glm-4.5-air.description": "GLM-4.5 lightweight edition that balances performance and cost, with flexible hybrid thinking modes.",
"glm-4.5-airx.description": "GLM-4.5-Air fast edition with quicker responses for high-scale, high-speed use.",
"glm-4.5-x.description": "GLM-4.5 fast edition, delivering strong performance with generation speeds up to 100 tokens/sec.",
"glm-4.5.description": "Zhipu flagship model with a switchable thinking mode, delivering open-source SOTA overall and up to 128K context.",
"glm-4.5v.description": "Zhipus next-generation MoE vision reasoning model has 106B total parameters with 12B active, achieving SOTA among similarly sized open-source multimodal models across image, video, document understanding, and GUI tasks.",
"glm-4.6.description": "Zhipu's latest flagship model GLM-4.6 (355B) fully surpasses its predecessors in advanced coding, long-text processing, reasoning, and agent capabilities. It particularly aligns with Claude Sonnet 4 in programming ability, becoming China's top Coding model.",
"glm-4.6v-flash.description": "The GLM-4.6V series represents a major iteration of the GLM family in the multimodal direction, comprising GLM-4.6V (flagship), GLM-4.6V-FlashX (lightweight and high-speed), and GLM-4.6V-Flash (fully free). It extends the training-time context window to 128k tokens, achieves state-of-the-art visual understanding accuracy at comparable parameter scales, and, for the first time, natively integrates Function Call (tool invocation) capabilities into the visual model architecture. This unifies the pipeline from “visual perception” to “executable actions,” providing a consistent technical foundation for multimodal agents in real-world production scenarios.",
"glm-4.6v-flashx.description": "The GLM-4.6V series represents a major iteration of the GLM family in the multimodal direction, comprising GLM-4.6V (flagship), GLM-4.6V-FlashX (lightweight and high-speed), and GLM-4.6V-Flash (fully free). It extends the training-time context window to 128k tokens, achieves state-of-the-art visual understanding accuracy at comparable parameter scales, and, for the first time, natively integrates Function Call (tool invocation) capabilities into the visual model architecture. This unifies the pipeline from “visual perception” to “executable actions,” providing a consistent technical foundation for multimodal agents in real-world production scenarios.",
"glm-4.6v.description": "The GLM-4.6V series represents a major iteration of the GLM family in the multimodal direction, comprising GLM-4.6V (flagship), GLM-4.6V-FlashX (lightweight and high-speed), and GLM-4.6V-Flash (fully free). It extends the training-time context window to 128k tokens, achieves state-of-the-art visual understanding accuracy at comparable parameter scales, and, for the first time, natively integrates Function Call (tool invocation) capabilities into the visual model architecture. This unifies the pipeline from “visual perception” to “executable actions,” providing a consistent technical foundation for multimodal agents in real-world production scenarios.",
"glm-4.7-flash.description": "GLM-4.7-Flash, as a 30B-level SOTA model, offers a new choice that balances performance and efficiency. It enhances coding capabilities, long-term task planning, and tool collaboration for Agentic Coding scenarios, achieving leading performance among open-source models of the same size in multiple current benchmark leaderboards. In executing complex intelligent agent tasks, it has stronger instruction compliance during tool calls, and further improves the aesthetics of front-end and the efficiency of long-term task completion for Artifacts and Agentic Coding.",
"glm-4.7-flashx.description": "GLM-4.7-Flash, as a 30B-level SOTA model, offers a new choice that balances performance and efficiency. It enhances coding capabilities, long-term task planning, and tool collaboration for Agentic Coding scenarios, achieving leading performance among open-source models of the same size in multiple current benchmark leaderboards. In executing complex intelligent agent tasks, it has stronger instruction compliance during tool calls, and further improves the aesthetics of front-end and the efficiency of long-term task completion for Artifacts and Agentic Coding.",
"glm-4.7.description": "GLM-4.7 is Zhipu's latest flagship model, enhanced for Agentic Coding scenarios with improved coding capabilities, long-term task planning, and tool collaboration. It achieves leading performance among open-source models on multiple public benchmarks. General capabilities are improved with more concise and natural responses and more immersive writing. For complex agent tasks, instruction following during tool calls is stronger, and the frontend aesthetics and long-term task completion efficiency of Artifacts and Agentic Coding are further enhanced.",
"glm-4.description": "GLM-4 is the older flagship released in Jan 2024, now replaced by the stronger GLM-4-0520.",
"glm-4v-flash.description": "GLM-4V-Flash focuses on efficient single-image understanding for fast analysis scenarios such as real-time or batch image processing.",
"glm-4v-plus-0111.description": "GLM-4V-Plus understands video and multiple images, suitable for multimodal tasks.",
"glm-4v-plus.description": "GLM-4V-Plus understands video and multiple images, suitable for multimodal tasks.",
"glm-4v.description": "GLM-4V provides strong image understanding and reasoning across visual tasks.",
"glm-5.description": "GLM-5 is Zhipus next-generation flagship foundation model, purpose-built for Agentic Engineering. It delivers reliable productivity in complex systems engineering and long-horizon agentic tasks. In coding and agent capabilities, GLM-5 achieves state-of-the-art performance among open-source models. In real-world programming scenarios, its user experience approaches that of Claude Opus 4.5. It excels at complex systems engineering and long-horizon agent tasks, making it an ideal foundation model for general-purpose agent assistants.",
"glm-image.description": "GLM-Image is Zhipus new flagship image generation model. The model was trained end-to-end on domestically produced chips and adopts an original hybrid architecture that combines autoregressive modeling with a diffusion decoder. This design enables strong global instruction understanding alongside fine-grained local detail rendering, overcoming long-standing challenges in generating knowledge-dense content such as posters, presentations, and educational diagrams. It represents an important exploration toward a new generation of “cognitive generative” technology paradigms, exemplified by Nano Banana Pro.",
"glm-z1-air.description": "Reasoning model with strong reasoning for tasks that require deep inference.",
"glm-z1-airx.description": "Ultra-fast reasoning with high reasoning quality.",
"glm-z1-flash.description": "GLM-Z1 series provides strong complex reasoning, excelling in logic, math, and programming.",
"glm-z1-flashx.description": "Fast and low-cost: Flash-enhanced with ultra-fast reasoning and higher concurrency.",
"glm-zero-preview.description": "GLM-Zero-Preview delivers strong complex reasoning, excelling in logic, math, and programming.",
"global.anthropic.claude-haiku-4-5-20251001-v1:0.description": "Claude Haiku 4.5 is Anthropic's fastest and most intelligent Haiku model, with lightning speed and extended thinking.",
"global.anthropic.claude-opus-4-5-20251101-v1:0.description": "Claude Opus 4.5 is Anthropic's flagship model, combining exceptional intelligence and scalable performance for complex tasks requiring the highest-quality responses and reasoning.",
"global.anthropic.claude-opus-4-6-v1.description": "Claude Opus 4.6 is Anthropic's most intelligent model for building agents and coding.",
"global.anthropic.claude-sonnet-4-5-20250929-v1:0.description": "Claude Sonnet 4.5 is Anthropic's most intelligent model to date.",
"global.anthropic.claude-sonnet-4-6.description": "Claude Sonnet 4.6 is Anthropics best combination of speed and intelligence.",
"google/gemini-2.0-flash-001.description": "Gemini 2.0 Flash delivers next-gen capabilities, including excellent speed, native tool use, multimodal generation, and a 1M-token context window.",
"google/gemini-2.0-flash-lite-001.description": "Gemini 2.0 Flash Lite is a lightweight Gemini variant with thinking disabled by default to improve latency and cost, but it can be enabled via parameters.",
"google/gemini-2.0-flash-lite.description": "Gemini 2.0 Flash Lite delivers next-gen features including exceptional speed, built-in tool use, multimodal generation, and a 1M-token context window.",
"google/gemini-2.0-flash.description": "Gemini 2.0 Flash is Googles high-performance reasoning model for extended multimodal tasks.",
"google/gemini-2.5-flash-image-preview.description": "Gemini 2.5 Flash experimental model with image generation support.",
"google/gemini-2.5-flash-image.description": "Gemini 2.5 Flash Image (Nano Banana) is Googles image generation model with multimodal conversation support.",
"google/gemini-2.5-flash-lite.description": "Gemini 2.5 Flash Lite is the lightweight Gemini 2.5 variant optimized for latency and cost, suitable for high-throughput scenarios.",
"google/gemini-2.5-flash-preview.description": "Gemini 2.5 Flash is Googles most advanced flagship model, built for advanced reasoning, coding, math, and science tasks. It includes built-in “thinking” to deliver higher-accuracy responses with finer context processing.\n\nNote: This model has two variants—thinking and non-thinking. Output pricing differs significantly depending on whether thinking is enabled. If you choose the standard variant (without the “:thinking” suffix), the model will explicitly avoid generating thinking tokens.\n\nTo use thinking and receive thinking tokens, you must select the “:thinking” variant, which incurs higher thinking output pricing.\n\nGemini 2.5 Flash can also be configured via the “max reasoning tokens” parameter as documented (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).",
"google/gemini-2.5-flash-preview:thinking.description": "Gemini 2.5 Flash is Googles most advanced flagship model, built for advanced reasoning, coding, math, and science tasks. It includes built-in “thinking” to deliver higher-accuracy responses with finer context processing.\n\nNote: This model has two variants—thinking and non-thinking. Output pricing differs significantly depending on whether thinking is enabled. If you choose the standard variant (without the “:thinking” suffix), the model will explicitly avoid generating thinking tokens.\n\nTo use thinking and receive thinking tokens, you must select the “:thinking” variant, which incurs higher thinking output pricing.\n\nGemini 2.5 Flash can also be configured via the “max reasoning tokens” parameter as documented (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).",
"google/gemini-2.5-flash.description": "Gemini 2.5 Flash is Googles family spanning low latency to high-performance reasoning.",
"google/gemini-2.5-pro-preview.description": "Gemini 2.5 Pro Preview is Googles most advanced thinking model for reasoning over complex problems in code, math, and STEM, and for analyzing large datasets, codebases, and documents with long context.",
"google/gemini-2.5-pro.description": "Gemini 2.5 Pro is Googles flagship reasoning model with long context support for complex tasks.",
"google/gemini-3-pro-image-preview.description": "Gemini 3 Pro Image (Nano Banana Pro) is Googles image generation model with multimodal conversation support.",
"google/gemini-3-pro-preview.description": "Gemini 3 Pro is the next-generation multimodal reasoning model in the Gemini family, understanding text, audio, images, and video, and handling complex tasks and large codebases.",
"google/gemini-embedding-001.description": "A state-of-the-art embedding model with strong performance in English, multilingual, and code tasks.",
"google/gemini-flash-1.5.description": "Gemini 1.5 Flash provides optimized multimodal processing for a range of complex tasks.",
"google/gemini-pro-1.5.description": "Gemini 1.5 Pro combines the latest optimizations for more efficient multimodal data processing.",
"google/gemma-2-27b-it.description": "Gemma 2 27B is a general-purpose LLM with strong performance across many scenarios.",
"google/gemma-2-27b.description": "Gemma 2 is Googles efficient model family for use cases from small apps to complex data processing.",
"google/gemma-2-2b-it.description": "An advanced small language model designed for edge applications.",
"google/gemma-2-9b-it.description": "Gemma 2 9B, developed by Google, offers efficient instruction following and solid overall capability.",
"google/gemma-2-9b-it:free.description": "Gemma 2 is Googles lightweight open-source text model family.",
"google/gemma-2-9b.description": "Gemma 2 is Googles efficient model family for use cases from small apps to complex data processing.",
"google/gemma-2b-it.description": "Gemma Instruct (2B) provides basic instruction handling for lightweight applications.",
"google/gemma-3-12b-it.description": "Gemma 3 12B is a Google open-source language model setting a new bar for efficiency and performance.",
"google/gemma-3-27b-it.description": "Gemma 3 27B is a Google open-source language model setting a new bar for efficiency and performance.",
"google/text-embedding-005.description": "An English-focused text embedding model optimized for code and English language tasks.",
"google/text-multilingual-embedding-002.description": "A multilingual text embedding model optimized for cross-lingual tasks across many languages.",
"gpt-3.5-turbo-0125.description": "GPT 3.5 Turbo for text generation and understanding; currently points to gpt-3.5-turbo-0125.",
"gpt-3.5-turbo-0613.description": "GPT 3.5 Turbo is a fast and efficient model for various tasks.",
"gpt-3.5-turbo-1106.description": "GPT 3.5 Turbo for text generation and understanding; currently points to gpt-3.5-turbo-0125.",
"gpt-3.5-turbo-instruct.description": "GPT 3.5 Turbo for text generation and understanding tasks, optimized for instruction following.",
"gpt-3.5-turbo.description": "GPT 3.5 Turbo for text generation and understanding; currently points to gpt-3.5-turbo-0125.",
"gpt-35-turbo-16k.description": "GPT-3.5 Turbo 16k is a high-capacity text generation model for complex tasks.",
"gpt-35-turbo.description": "GPT-3.5 Turbo is OpenAIs efficient model for chat and text generation, supporting parallel function calling.",
"gpt-4-0125-preview.description": "The latest GPT-4 Turbo adds vision. Visual requests support JSON mode and function calling. It is a cost-effective multimodal model that balances accuracy and efficiency for real-time applications.",
"gpt-4-0613.description": "GPT-4 provides a larger context window to handle longer inputs, suitable for broad information synthesis and data analysis.",
"gpt-4-1106-preview.description": "The latest GPT-4 Turbo adds vision. Visual requests support JSON mode and function calling. It is a cost-effective multimodal model that balances accuracy and efficiency for real-time applications.",
"gpt-4-32k-0613.description": "GPT-4 provides a larger context window to handle longer inputs for scenarios needing broad information integration and data analysis.",
"gpt-4-32k.description": "GPT-4 provides a larger context window to handle longer inputs for scenarios needing broad information integration and data analysis.",
"gpt-4-o-preview.description": "GPT-4o is the most advanced multimodal model, handling text and image inputs.",
"gpt-4-turbo-2024-04-09.description": "The latest GPT-4 Turbo adds vision. Visual requests support JSON mode and function calling. It is a cost-effective multimodal model that balances accuracy and efficiency for real-time applications.",
"gpt-4-turbo-preview.description": "The latest GPT-4 Turbo adds vision. Visual requests support JSON mode and function calling. It is a cost-effective multimodal model that balances accuracy and efficiency for real-time applications.",
"gpt-4-turbo.description": "The latest GPT-4 Turbo adds vision. Visual requests support JSON mode and function calling. It is a cost-effective multimodal model that balances accuracy and efficiency for real-time applications.",
"gpt-4-vision-preview.description": "GPT-4 Vision preview, designed for image analysis and processing tasks.",
"gpt-4.1-2025-04-14.description": "GPT-4.1 is the flagship model for complex tasks, ideal for cross-domain problem solving.",
"gpt-4.1-mini.description": "GPT-4.1 mini balances intelligence, speed, and cost, making it attractive for many use cases.",
"gpt-4.1-nano.description": "GPT-4.1 nano is the fastest and most cost-effective GPT-4.1 model.",
"gpt-4.1.description": "GPT-4.1 is our flagship model for complex tasks and cross-domain problem solving.",
"gpt-4.5-preview.description": "GPT-4.5-preview is the latest general-purpose model with deep world knowledge and better intent understanding, strong at creative tasks and agent planning. Its knowledge cutoff is October 2023.",
"gpt-4.description": "GPT-4 provides a larger context window to handle longer inputs, suitable for broad information synthesis and data analysis.",
"gpt-4o-2024-05-13.description": "ChatGPT-4o is a dynamic model updated in real time, combining strong understanding and generation for large-scale use cases like customer support, education, and technical support.",
"gpt-4o-2024-08-06.description": "ChatGPT-4o is a dynamic model updated in real time. It combines strong language understanding and generation for large-scale use cases like customer support, education, and technical assistance.",
"gpt-4o-2024-11-20.description": "ChatGPT-4o is a dynamic model updated in real time, combining strong understanding and generation for large-scale use cases like customer support, education, and technical support.",
"gpt-4o-audio-preview.description": "GPT-4o Audio Preview model with audio input and output.",
"gpt-4o-mini-2024-07-18.description": "GPT-4o mini is a cost-effective solution for a wide range of text and image tasks.",
"gpt-4o-mini-audio-preview.description": "GPT-4o mini Audio model with audio input and output.",
"gpt-4o-mini-realtime-preview.description": "GPT-4o-mini realtime variant with audio and text real-time I/O.",
"gpt-4o-mini-search-preview.description": "GPT-4o mini Search Preview is trained to understand and execute web search queries via the Chat Completions API. Web search is billed per tool call in addition to token costs.",
"gpt-4o-mini-transcribe.description": "GPT-4o Mini Transcribe is a speech-to-text model that transcribes audio with GPT-4o, improving word error rate, language ID, and accuracy over the original Whisper model.",
"gpt-4o-mini-tts.description": "GPT-4o mini TTS is a text-to-speech model built on GPT-4o mini, converting text into natural-sounding speech with a max input of 2000 tokens.",
"gpt-4o-mini.description": "GPT-4o mini is OpenAIs latest model after GPT-4 Omni, supporting text+image input with text output. It is their most advanced small model, far cheaper than recent frontier models and over 60% cheaper than GPT-3.5 Turbo, while maintaining top-tier intelligence (82% MMLU).",
"gpt-4o-realtime-preview-2024-10-01.description": "GPT-4o realtime variant with audio and text real-time I/O.",
"gpt-4o-realtime-preview-2025-06-03.description": "GPT-4o realtime variant with audio and text real-time I/O.",
"gpt-4o-realtime-preview.description": "GPT-4o realtime variant with audio and text real-time I/O.",
"gpt-4o-search-preview.description": "GPT-4o Search Preview is trained to understand and execute web search queries via the Chat Completions API. Web search is billed per tool call in addition to token costs.",
"gpt-4o-transcribe.description": "GPT-4o Transcribe is a speech-to-text model that transcribes audio with GPT-4o, improving word error rate, language ID, and accuracy over the original Whisper model.",
"gpt-4o.description": "ChatGPT-4o is a dynamic model updated in real time, combining strong understanding and generation for large-scale use cases like customer support, education, and technical support.",
"gpt-5-chat-latest.description": "The GPT-5 model used in ChatGPT, combining strong understanding and generation for conversational applications.",
"gpt-5-chat.description": "GPT-5 Chat is a preview model optimized for conversational scenarios. It supports text and image input, outputs text only, and fits chatbots and conversational AI applications.",
"gpt-5-codex.description": "GPT-5 Codex is a GPT-5 variant optimized for agentic coding tasks in Codex-like environments.",
"gpt-5-mini.description": "A faster, more cost-efficient GPT-5 variant for well-defined tasks, delivering quicker responses while maintaining quality.",
"gpt-5-nano.description": "The fastest and most cost-effective GPT-5 variant, ideal for latency- and cost-sensitive applications.",
"gpt-5-pro.description": "GPT-5 pro uses more compute to think deeper and consistently deliver better answers.",
"gpt-5.1-chat-latest.description": "GPT-5.1 Chat: the ChatGPT variant of GPT-5.1, built for chat scenarios.",
"gpt-5.1-codex-max.description": "GPT-5.1 Codex Max: OpenAI's most intelligent coding model, optimized for long-horizon agentic coding tasks, supports reasoning tokens.",
"gpt-5.1-codex-mini.description": "GPT-5.1 Codex mini: a smaller, lower-cost Codex variant optimized for agentic coding tasks.",
"gpt-5.1-codex.description": "GPT-5.1 Codex: a GPT-5.1 variant optimized for agentic coding tasks, for complex code/agent workflows in the Responses API.",
"gpt-5.1.description": "GPT-5.1 — a flagship model optimized for coding and agent tasks with configurable reasoning effort and longer context.",
"gpt-5.2-chat-latest.description": "GPT-5.2 Chat is the ChatGPT variant (chat-latest) for the latest conversation improvements.",
"gpt-5.2-codex.description": "GPT-5.2-Codex is an upgraded GPT-5.2 variant optimized for long-horizon, agentic coding tasks.",
"gpt-5.2-pro.description": "GPT-5.2 pro: a smarter, more precise GPT-5.2 variant (Responses API only), suited for hard problems and longer multi-turn reasoning.",
"gpt-5.2.description": "GPT-5.2 is a flagship model for coding and agentic workflows with stronger reasoning and long-context performance.",
"gpt-5.3-chat-latest.description": "GPT-5.3 Chat is the latest ChatGPT model used in ChatGPT with improved conversation experiences.",
"gpt-5.3-codex.description": "GPT-5.3-Codex is the most capable agentic coding model to date, optimized for agentic coding tasks in Codex or similar environments.",
"gpt-5.4-mini.description": "GPT-5.4 mini is OpenAI's strongest mini model for coding, computer use, and subagents.",
"gpt-5.4-nano.description": "GPT-5.4 nano is OpenAI's cheapest GPT-5.4-class model for simple high-volume tasks.",
"gpt-5.4-pro.description": "GPT-5.4 Pro uses more compute to think harder and provide consistently better answers, available in the Responses API only.",
"gpt-5.4.description": "GPT-5.4 is the frontier model for complex professional work with highest reasoning capability.",
"gpt-5.description": "The best model for cross-domain coding and agent tasks. GPT-5 leaps in accuracy, speed, reasoning, context awareness, structured thinking, and problem solving.",
"gpt-audio.description": "GPT Audio is a general chat model for audio input/output, supported in the Chat Completions API.",
"gpt-image-1-mini.description": "A lower-cost GPT Image 1 variant with native text and image input and image output.",
"gpt-image-1.5.description": "An enhanced GPT Image 1 model with 4× faster generation, more precise editing, and improved text rendering.",
"gpt-image-1.description": "ChatGPT native multimodal image generation model.",
"gpt-oss-120b.description": "Access requires an application. GPT-OSS-120B is an open-source large language model from OpenAI with strong text generation capability.",
"gpt-oss-20b.description": "Access requires an application. GPT-OSS-20B is an open-source mid-size language model from OpenAI with efficient text generation.",
"gpt-oss:120b.description": "GPT-OSS 120B is OpenAIs large open-source LLM using MXFP4 quantization and positioned as a flagship model. It requires multi-GPU or high-end workstation environments and delivers excellent performance in complex reasoning, code generation, and multilingual processing, with advanced function calling and tool integration.",
"gpt-oss:20b.description": "GPT-OSS 20B is an open-source LLM from OpenAI using MXFP4 quantization, suitable for high-end consumer GPUs or Apple Silicon Macs. It performs well in dialogue generation, coding, and reasoning tasks, supporting function calling and tool use.",
"gpt-realtime.description": "A general real-time model supporting real-time text and audio I/O, plus image input.",
"grok-3-mini.description": "A lightweight model that thinks before responding. Its fast and smart for logic tasks that dont require deep domain knowledge, with access to raw reasoning traces.",
"grok-3.description": "A flagship model that excels at enterprise use cases like data extraction, coding, and summarization, with deep domain knowledge in finance, healthcare, law, and science.",
"grok-4-0709.description": "xAIs Grok 4 with strong reasoning capability.",
"grok-4-1-fast-non-reasoning.description": "A frontier multimodal model optimized for high-performance agent tool use.",
"grok-4-1-fast-reasoning.description": "A frontier multimodal model optimized for high-performance agent tool use.",
"grok-4-fast-non-reasoning.description": "Were excited to release Grok 4 Fast, our latest progress in cost-effective reasoning models.",
"grok-4-fast-reasoning.description": "Were excited to release Grok 4 Fast, our latest progress in cost-effective reasoning models.",
"grok-4.20-beta-0309-non-reasoning.description": "A non-reasoning variant for simple use cases",
"grok-4.20-beta-0309-reasoning.description": "Intelligent, blazing-fast model that reasons before responding",
"grok-4.20-multi-agent-beta-0309.description": "A team of 4 or 16 agents, Excels at research use cases, Does not currently support client-side tools. Only supports xAI server side tools (eg X Search, Web Search tools) and remote MCP tools.",
"grok-4.description": "Our newest and strongest flagship model, excelling in NLP, math, and reasoning—an ideal all-rounder.",
"grok-code-fast-1.description": "Were excited to launch grok-code-fast-1, a fast and cost-effective reasoning model that excels at agentic coding.",
"grok-imagine-image-pro.description": "Generate images from text prompts, edit existing images with natural language, or iteratively refine images through multi-turn conversations.",
"grok-imagine-image.description": "Generate images from text prompts, edit existing images with natural language, or iteratively refine images through multi-turn conversations.",
"groq/compound-mini.description": "Compound-mini is a composite AI system powered by publicly available models supported on GroqCloud, intelligently and selectively using tools to answer user queries.",
"groq/compound.description": "Compound is a composite AI system powered by multiple publicly available models supported on GroqCloud, intelligently and selectively using tools to answer user queries.",
"gryphe/mythomax-l2-13b.description": "MythoMax L2 13B is a creative, intelligent language model merged from multiple top models.",
"hunyuan-2.0-instruct-20251111.description": "Release Features: The model base has been upgraded from TurboS to **Hunyuan 2.0**, resulting in comprehensive capability improvements. It significantly enhances instruction-following, multi-turn and long-form text understanding, literary creation, knowledge accuracy, coding, and reasoning abilities.",
"hunyuan-2.0-thinking-20251109.description": "Release Features: The model base has been upgraded from TurboS to **Hunyuan 2.0**, resulting in comprehensive capability improvements. It significantly enhances the models ability to follow complex instructions, understand multi-turn and long-form text, handle code, operate as an agent, and perform reasoning tasks.",
"hunyuan-a13b.description": "The first hybrid reasoning model from Hunyuan, upgraded from hunyuan-standard-256K (80B total, 13B active). It defaults to slow thinking and supports fast/slow switching via params or prefixing /no_think. Overall capability is improved over the previous generation, especially in math, science, long-text understanding, and agent tasks.",
"hunyuan-code.description": "Hunyuans latest code model trained on 200B high-quality code data plus six months of SFT data, with 8K context. It ranks near the top in automated code benchmarks and in expert human evaluations across five languages.",
"hunyuan-functioncall.description": "Hunyuans latest MoE FunctionCall model trained on high-quality tool-call data, with a 32K context window and leading benchmarks across dimensions.",
"hunyuan-large-longcontext.description": "Excels at long-document tasks like summarization and QA while also handling general generation. Strong at long-text analysis and generation for complex, detailed content.",
"hunyuan-large.description": "Hunyuan-large has ~389B total parameters and ~52B activated, the largest and strongest open MoE model in a Transformer architecture.",
"hunyuan-lite.description": "Upgraded to an MoE architecture with a 256k context window, leading many open models across NLP, code, math, and industry benchmarks.",
"hunyuan-pro.description": "Trillion-parameter MOE-32K long-context model leading benchmarks, strong at complex instructions and reasoning, advanced math, function calling, and optimized for multilingual translation, finance, law, and medical domains.",
"hunyuan-role.description": "Hunyuans latest roleplay model, officially fine-tuned with roleplay data, delivering stronger base performance in roleplay scenarios.",
"hunyuan-standard-256K.description": "Uses improved routing to mitigate load balancing and expert collapse. Achieves 99.9% needle-in-a-haystack on long context. MOE-256K further expands context length and quality.",
"hunyuan-standard.description": "Uses improved routing to mitigate load balancing and expert collapse. Achieves 99.9% needle-in-a-haystack on long context. MOE-32K offers strong value while handling long inputs.",
"hunyuan-t1-20250321.description": "Builds balanced arts and STEM capabilities with strong long-text information capture. Supports reasoning answers for math, logic, science, and code problems across difficulty levels.",
"hunyuan-t1-20250403.description": "Improves project-level code generation and writing quality, strengthens multi-turn topic understanding and ToB instruction following, improves word-level understanding, and reduces mixed simplified/traditional and Chinese/English output issues.",
"hunyuan-t1-20250529.description": "Improves creative writing and composition, strengthens frontend coding, math, and logic reasoning, and enhances instruction following.",
"hunyuan-t1-20250711.description": "Greatly improves hard math, logic, and coding, boosts output stability, and enhances long-text capability.",
"hunyuan-t1-latest.description": "Significantly improves the slow-thinking model on hard math, complex reasoning, difficult coding, instruction following, and creative writing quality.",
"hunyuan-t1-vision-20250916.description": "Latest t1-vision deep reasoning model with major improvements in VQA, visual grounding, OCR, charts, solving photographed problems, and image-based creation, plus stronger English and low-resource languages.",
"hunyuan-turbo-20241223.description": "This version boosts instruction scaling for better generalization, significantly improves math/code/logic reasoning, enhances word-level understanding, and improves writing quality.",
"hunyuan-turbo-latest.description": "General experience improvements across NLP understanding, writing, chat, QA, translation, and domains; more human-like responses, better clarification on ambiguous intent, improved word parsing, higher creative quality and interactivity, and stronger multi-turn conversations.",
"hunyuan-turbo.description": "Preview of Hunyuans next-gen LLM with a new MoE architecture, delivering faster reasoning and stronger results than hunyuan-pro.",
"hunyuan-turbos-latest.description": "The latest Hunyuan TurboS flagship model with stronger reasoning and a better overall experience.",
"hunyuan-turbos-longtext-128k-20250325.description": "Excels at long-document tasks like summarization and QA while also handling general generation. Strong at long-text analysis and generation for complex, detailed content.",
"hunyuan-turbos-vision-video.description": "Applicable to video understanding scenarios. Release features: Based on the **Hunyuan Turbos-Vision** video understanding model, supporting fundamental video understanding capabilities such as video description and video content question answering.",
"hunyuan-vision-1.5-instruct.description": "A fast-thinking image-to-text model built on the TurboS text base, showing notable improvements over the previous version in fundamental image recognition and image analysis reasoning.",
"hunyuan-vision.description": "Hunyuan latest multimodal model supporting image + text inputs to generate text.",
"image-01-live.description": "An image generation model with fine detail, supporting text-to-image and controllable style presets.",
"image-01.description": "A new image generation model with fine detail, supporting text-to-image and image-to-image.",
"imagen-4.0-fast-generate-001.description": "Imagen 4th generation text-to-image model series Fast version",
"imagen-4.0-generate-001.description": "Imagen 4th generation text-to-image model series",
"imagen-4.0-ultra-generate-001.description": "Imagen 4th generation text-to-image model series Ultra version",
"inception/mercury-coder-small.description": "Mercury Coder Small is ideal for code generation, debugging, and refactoring with minimal latency.",
"inclusionAI/Ling-flash-2.0.description": "Ling-flash-2.0 is the third Ling 2.0 architecture model from Ant Groups Bailing team. It is an MoE model with 100B total parameters but only 6.1B active per token (4.8B non-embedding). Despite its lightweight configuration, it matches or exceeds 40B dense models and even larger MoE models on multiple benchmarks, exploring high efficiency through architecture and training strategy.",
"inclusionAI/Ling-mini-2.0.description": "Ling-mini-2.0 is a small, high-performance MoE LLM with 16B total parameters and only 1.4B active per token (789M non-embedding), delivering very fast generation. With efficient MoE design and large high-quality training data, it achieves top-tier performance comparable to dense models under 10B and larger MoE models.",
"inclusionAI/Ring-flash-2.0.description": "Ring-flash-2.0 is a high-performance thinking model optimized from Ling-flash-2.0-base. It uses an MoE architecture with 100B total parameters and only 6.1B active per inference. Its icepop algorithm stabilizes RL training for MoE models, enabling continued gains in complex reasoning. It achieves major breakthroughs on tough benchmarks (math contests, code generation, logical reasoning), surpassing top dense models under 40B and rivaling larger open MoE and closed reasoning models. It also performs well in creative writing, and its efficient architecture delivers fast inference at lower deployment cost for high concurrency.",
"inclusionai/ling-1t.description": "Ling-1T is inclusionAIs 1T MoE model, optimized for high-intensity reasoning tasks and large-context workloads.",
"inclusionai/ling-flash-2.0.description": "Ling-flash-2.0 is inclusionAIs MoE model optimized for efficiency and reasoning performance, suitable for mid-to-large tasks.",
"inclusionai/ling-mini-2.0.description": "Ling-mini-2.0 is inclusionAIs lightweight MoE model, significantly reducing cost while retaining reasoning capability.",
"inclusionai/ming-flash-omini-preview.description": "Ming-flash-omni Preview is inclusionAIs multimodal model, supporting speech, image, and video inputs, with improved image rendering and speech recognition.",
"inclusionai/ring-1t.description": "Ring-1T is inclusionAIs trillion-parameter MoE reasoning model, suited for large-scale reasoning and research tasks.",
"inclusionai/ring-flash-2.0.description": "Ring-flash-2.0 is a Ring model variant from inclusionAI for high-throughput scenarios, emphasizing speed and cost efficiency.",
"inclusionai/ring-mini-2.0.description": "Ring-mini-2.0 is inclusionAI's high-throughput lightweight MoE model, built for concurrency.",
"intern-latest.description": "By default, it points to our latest released Intern series model, currently set to intern-s1-pro.",
"intern-s1-mini.description": "A lightweight multimodal large model with strong scientific reasoning capabilities.",
"intern-s1-pro.description": "We have launched our most advanced open-source multimodal reasoning model, currently the top-performing open-source multimodal large language model in terms of overall performance.",
"intern-s1.description": "The open-source multimodal reasoning model not only demonstrates strong general-purpose capabilities but also achieves state-of-the-art performance across a wide range of scientific tasks.",
"internlm/internlm2_5-7b-chat.description": "InternLM2.5-7B-Chat is an open-source chat model based on the InternLM2 architecture. The 7B model focuses on dialogue generation with Chinese/English support, using modern training for fluent, intelligent conversation. It suits many chat scenarios such as customer support and personal assistants.",
"internvl2.5-38b-mpo.description": "InternVL2.5 38B MPO is a multimodal pretrained model for complex image-text reasoning.",
"internvl3-14b.description": "InternVL3 14B is a mid-size multimodal model balancing performance and cost.",
"internvl3-1b.description": "InternVL3 1B is a lightweight multimodal model for resource-constrained deployment.",
"internvl3-38b.description": "InternVL3 38B is a large open-source multimodal model for high-accuracy image-text understanding.",
"internvl3.5-241b-a28b.description": "Our newly released multimodal large model features enhanced image-and-text understanding and long-sequence image comprehension capabilities, achieving performance comparable to leading closed-source models.",
"internvl3.5-latest.description": "By default, it points to the latest model in the InternVL3.5 series, currently set to internvl3.5-241b-a28b.",
"irag-1.0.description": "ERNIE iRAG is an image retrieval-augmented generation model for image search, image-text retrieval, and content generation.",
"jamba-large.description": "Our most powerful, advanced model, designed for complex enterprise tasks with outstanding performance.",
"jamba-mini.description": "The most efficient model in its class, balancing speed and quality with a smaller footprint.",
"jina-deepsearch-v1.description": "DeepSearch combines web search, reading, and reasoning for thorough investigations. Think of it as an agent that takes your research task, performs broad searches with multiple iterations, and only then produces an answer. The process involves continuous research, reasoning, and multi-angle problem solving, fundamentally different from standard LLMs that answer from pretraining data or traditional RAG systems that rely on one-shot surface search.",
"kimi-k2-0711-preview.description": "kimi-k2 is an MoE foundation model with strong coding and agent capabilities (1T total params, 32B active), outperforming other mainstream open models across reasoning, programming, math, and agent benchmarks.",
"kimi-k2-0905-preview.description": "kimi-k2-0905-preview offers a 256k context window, stronger agentic coding, better front-end code quality, and improved context understanding.",
"kimi-k2-instruct.description": "Kimi K2 Instruct is Kimis official reasoning model with long context for code, QA, and more.",
"kimi-k2-thinking-turbo.description": "High-speed K2 long-thinking variant with 256k context, strong deep reasoning, and 60100 tokens/sec output.",
"kimi-k2-thinking.description": "kimi-k2-thinking is a Moonshot AI thinking model with general agentic and reasoning abilities. It excels at deep reasoning and can solve hard problems via multi-step tool use.",
"kimi-k2-turbo-preview.description": "kimi-k2 is an MoE foundation model with strong coding and agent capabilities (1T total params, 32B active), outperforming other mainstream open models across reasoning, programming, math, and agent benchmarks.",
"kimi-k2.5.description": "Kimi K2.5 is the most capable Kimi model, delivering open-source SOTA in agent tasks, coding, and vision understanding. It supports multimodal inputs and both thinking and non-thinking modes.",
"kimi-k2.description": "Kimi-K2 is a MoE base model from Moonshot AI with strong coding and agent capabilities, totaling 1T parameters with 32B active. On benchmarks for general reasoning, coding, math, and agent tasks, it outperforms other mainstream open models.",
"kimi-k2:1t.description": "Kimi K2 is a large MoE LLM from Moonshot AI with 1T total parameters and 32B active per forward pass. It is optimized for agent capabilities including advanced tool use, reasoning, and code synthesis.",
"kuaishou/kat-coder-pro-v1.description": "KAT-Coder-Pro-V1 (limited-time free) focuses on code understanding and automation for efficient coding agents.",
"labs-devstral-small-2512.description": "Devstral Small 2 excels at using tools to explore code bases, edit multiple files, and power software engineering agents.",
"lite.description": "Spark Lite is a lightweight LLM with ultra-low latency and efficient processing. It is fully free and supports real-time web search. Its fast responses perform well on low-compute devices and for model fine-tuning, delivering strong cost efficiency and an intelligent experience, especially for knowledge Q&A, content generation, and search scenarios.",
"llama-3.1-70b-versatile.description": "Llama 3.1 70B delivers stronger AI reasoning for complex applications, supporting heavy compute with high efficiency and accuracy.",
"llama-3.1-8b-instant.description": "Llama 3.1 8B is a high-efficiency model with fast text generation, ideal for large-scale, cost-effective applications.",
"llama-3.1-instruct.description": "Llama 3.1 instruction-tuned model is optimized for chat and beats many open chat models on common industry benchmarks.",
"llama-3.2-11b-vision-instruct.description": "Strong image reasoning on high-resolution images, suited for visual understanding apps.",
"llama-3.2-11b-vision-preview.description": "Llama 3.2 is designed for tasks combining vision and text, excelling at image captioning and visual QA to bridge language generation and visual reasoning.",
"llama-3.2-90b-vision-instruct.description": "Advanced image reasoning for visual-understanding agent applications.",
"llama-3.2-90b-vision-preview.description": "Llama 3.2 is designed for tasks combining vision and text, excelling at image captioning and visual QA to bridge language generation and visual reasoning.",
"llama-3.2-vision-instruct.description": "Llama 3.2-Vision instruction-tuned model is optimized for visual recognition, image reasoning, captioning, and general image Q&A.",
"llama-3.3-70b-versatile.description": "Meta Llama 3.3 is a multilingual LLM with 70B parameters (text in/text out), offering pre-trained and instruction-tuned variants. The instruction-tuned text-only model is optimized for multilingual dialogue use cases and outperforms many available open and closed chat models on common industry benchmarks.",
"llama-3.3-instruct.description": "Llama 3.3 instruction-tuned model is optimized for chat and beats many open chat models on common industry benchmarks.",
"llama3-70b-8192.description": "Meta Llama 3 70B offers exceptional complexity handling for demanding projects.",
"llama3-8b-8192.description": "Meta Llama 3 8B delivers strong reasoning performance for diverse scenarios.",
"llama3-groq-70b-8192-tool-use-preview.description": "Llama 3 Groq 70B Tool Use provides strong tool-calling for efficient handling of complex tasks.",
"llama3-groq-8b-8192-tool-use-preview.description": "Llama 3 Groq 8B Tool Use is optimized for efficient tool use with fast parallel compute.",
"llama3.1-8b.description": "Llama 3.1 8B: a small, low-latency Llama variant for lightweight online inference and chat.",
"llama3.1.description": "Llama 3.1 is Metas leading model, scaling up to 405B parameters for complex dialogue, multilingual translation, and data analysis.",
"llama3.1:405b.description": "Llama 3.1 is Metas leading model, scaling up to 405B parameters for complex dialogue, multilingual translation, and data analysis.",
"llama3.1:70b.description": "Llama 3.1 is Metas leading model, scaling up to 405B parameters for complex dialogue, multilingual translation, and data analysis.",
"llava-v1.5-7b-4096-preview.description": "LLaVA 1.5 7B fuses visual processing to generate complex outputs from visual inputs.",
"llava.description": "LLaVA is a multimodal model combining a vision encoder and Vicuna for strong vision-language understanding.",
"llava:13b.description": "LLaVA is a multimodal model combining a vision encoder and Vicuna for strong vision-language understanding.",
"llava:34b.description": "LLaVA is a multimodal model combining a vision encoder and Vicuna for strong vision-language understanding.",
"magistral-medium-latest.description": "Magistral Medium 1.2 is a frontier reasoning model from Mistral AI (Sep 2025) with vision support.",
"magistral-small-2509.description": "Magistral Small 1.2 is an open-source small reasoning model from Mistral AI (Sep 2025) with vision support.",
"mathstral.description": "MathΣtral is built for scientific research and mathematical reasoning, with strong computation and explanation.",
"max-32k.description": "Spark Max 32K offers large-context processing with stronger context understanding and logical reasoning, supporting 32K-token inputs for long document reading and private knowledge Q&A.",
"megrez-3b-instruct.description": "Megrez 3B Instruct is a small, efficient model from Wuwen Xinqiong.",
"meituan/longcat-flash-chat.description": "An open-source non-thinking base model from Meituan optimized for dialogue and agent tasks, strong in tool use and complex multi-turn interactions.",
"meta-llama-3-70b-instruct.description": "A powerful 70B-parameter model that excels at reasoning, coding, and broad language tasks.",
"meta-llama-3-8b-instruct.description": "A versatile 8B-parameter model optimized for chat and text generation.",
"meta-llama-3.1-405b-instruct.description": "Llama 3.1 instruction-tuned text model optimized for multilingual chat, performing strongly on common industry benchmarks among open and closed chat models.",
"meta-llama-3.1-70b-instruct.description": "Llama 3.1 instruction-tuned text model optimized for multilingual chat, performing strongly on common industry benchmarks among open and closed chat models.",
"meta-llama-3.1-8b-instruct.description": "Llama 3.1 instruction-tuned text model optimized for multilingual chat, performing strongly on common industry benchmarks among open and closed chat models.",
"meta-llama/Llama-2-13b-chat-hf.description": "LLaMA-2 Chat (13B) provides strong language handling and a solid chat experience.",
"meta-llama/Llama-2-70b-hf.description": "LLaMA-2 provides strong language handling and a solid interaction experience.",
"meta-llama/Llama-3-70b-chat-hf.description": "Llama 3 70B Instruct Reference is a powerful chat model for complex dialogues.",
"meta-llama/Llama-3-8b-chat-hf.description": "Llama 3 8B Instruct Reference offers multilingual support and broad domain knowledge.",
"meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo.description": "LLaMA 3.2 is designed for tasks combining vision and text. It excels at image captioning and visual QA, bridging language generation and visual reasoning.",
"meta-llama/Llama-3.2-3B-Instruct-Turbo.description": "LLaMA 3.2 is designed for tasks combining vision and text. It excels at image captioning and visual QA, bridging language generation and visual reasoning.",
"meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo.description": "LLaMA 3.2 is designed for tasks combining vision and text. It excels at image captioning and visual QA, bridging language generation and visual reasoning.",
"meta-llama/Llama-3.3-70B-Instruct-Turbo.description": "Meta Llama 3.3 multilingual LLM is a 70B (text-in/text-out) pretrained and instruction-tuned model. The instruction-tuned text-only version is optimized for multilingual chat and outperforms many open and closed chat models on common industry benchmarks.",
"meta-llama/Llama-Vision-Free.description": "LLaMA 3.2 is designed for tasks combining vision and text. It excels at image captioning and visual QA, bridging language generation and visual reasoning.",
"meta-llama/Meta-Llama-3-70B-Instruct-Lite.description": "Llama 3 70B Instruct Lite is built for high performance with lower latency.",
"meta-llama/Meta-Llama-3-70B-Instruct-Turbo.description": "Llama 3 70B Instruct Turbo delivers strong understanding and generation for the most demanding workloads.",
"meta-llama/Meta-Llama-3-8B-Instruct-Lite.description": "Llama 3 8B Instruct Lite balances performance for resource-constrained environments.",
"meta-llama/Meta-Llama-3-8B-Instruct-Turbo.description": "Llama 3 8B Instruct Turbo is a high-performance LLM for a wide range of use cases.",
"meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo.description": "The 405B Llama 3.1 Turbo model provides massive context capacity for big data processing and excels in ultra-scale AI applications.",
"meta-llama/Meta-Llama-3.1-405B-Instruct.description": "Llama 3.1 is Metas leading model family, scaling up to 405B parameters for complex dialogue, multilingual translation, and data analysis.",
"meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo.description": "Llama 3.1 70B is finely tuned for high-load applications; FP8 quantization delivers efficient compute and accuracy for complex scenarios.",
"meta-llama/Meta-Llama-3.1-70B.description": "Llama 3.1 is Metas leading model family, scaling up to 405B parameters for complex dialogue, multilingual translation, and data analysis.",
"meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo.description": "Llama 3.1 8B uses FP8 quantization, supports up to 131,072 context tokens, and ranks among top open models for complex tasks across many benchmarks.",
"meta-llama/llama-3-70b-instruct.description": "Llama 3 70B Instruct is optimized for high-quality dialogue and performs strongly in human evaluations.",
"meta-llama/llama-3-8b-instruct.description": "Llama 3 8B Instruct is optimized for high-quality dialogue, outperforming many closed models.",
"meta-llama/llama-3.1-70b-instruct.description": "Metas latest Llama 3.1 series, the 70B instruction-tuned variant optimized for high-quality dialogue. In industry evaluations, it shows strong performance against leading closed models. (Available only to enterprise-verified entities.)",
"meta-llama/llama-3.1-8b-instruct.description": "Metas latest Llama 3.1 series, the 8B instruction-tuned variant is especially fast and efficient. In industry evaluations, it delivers strong performance, surpassing many leading closed models. (Available only to enterprise-verified entities.)",
"meta-llama/llama-3.1-8b-instruct:free.description": "LLaMA 3.1 offers multilingual support and is one of the leading generative models.",
"meta-llama/llama-3.2-11b-vision-instruct.description": "LLaMA 3.2 is designed for tasks combining vision and text. It excels at image captioning and visual QA, bridging language generation and visual reasoning.",
"meta-llama/llama-3.2-3b-instruct.description": "meta-llama/llama-3.2-3b-instruct",
"meta-llama/llama-3.3-70b-instruct.description": "Llama 3.3 is the most advanced multilingual open-source Llama model, delivering near-405B performance at very low cost. It is Transformer-based and improved with SFT and RLHF for usefulness and safety. The instruction-tuned version is optimized for multilingual chat and beats many open and closed chat models on industry benchmarks. Knowledge cutoff: Dec 2023.",
"meta-llama/llama-3.3-70b-instruct:free.description": "Llama 3.3 is the most advanced multilingual open-source Llama model, delivering near-405B performance at very low cost. It is Transformer-based and improved with SFT and RLHF for usefulness and safety. The instruction-tuned version is optimized for multilingual chat and beats many open and closed chat models on industry benchmarks. Knowledge cutoff: Dec 2023.",
"meta.llama3-1-405b-instruct-v1:0.description": "Meta Llama 3.1 405B Instruct is the largest and most powerful Llama 3.1 Instruct model, a highly advanced model for dialogue reasoning and synthetic data generation, and a strong base for domain-specific continued pretraining or fine-tuning. The Llama 3.1 multilingual LLMs are a set of pre-trained and instruction-tuned generation models in 8B, 70B, and 405B sizes (text in/text out). The instruction-tuned text models are optimized for multilingual dialogue and outperform many available open chat models on common industry benchmarks. Llama 3.1 is designed for commercial and research use across languages. Instruction-tuned models are suited for assistant-style chat, while pretrained models fit broader natural language generation tasks. Llama 3.1 outputs can also be used to improve other models, including synthetic data generation and refinement. Llama 3.1 is an autoregressive Transformer model with an optimized architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to align with human preferences for helpfulness and safety.",
"meta.llama3-1-70b-instruct-v1:0.description": "An updated Meta Llama 3.1 70B Instruct with an extended 128K context window, multilingual support, and improved reasoning. The Llama 3.1 multilingual LLMs are a set of pre-trained and instruction-tuned generation models in 8B, 70B, and 405B sizes (text in/text out). The instruction-tuned text models are optimized for multilingual dialogue and outperform many available open chat models on common industry benchmarks. Llama 3.1 is designed for commercial and research use across languages. Instruction-tuned models are suited for assistant-style chat, while pretrained models fit broader natural language generation tasks. Llama 3.1 outputs can also be used to improve other models, including synthetic data generation and refinement. Llama 3.1 is an autoregressive Transformer model with an optimized architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to align with human preferences for helpfulness and safety.",
"meta.llama3-1-8b-instruct-v1:0.description": "An updated Meta Llama 3.1 8B Instruct with a 128K context window, multilingual support, and improved reasoning. The Llama 3.1 family includes 8B, 70B, and 405B instruction-tuned text models optimized for multilingual chat and strong benchmark performance. It is designed for commercial and research use across languages; instruction-tuned models suit assistant-style chat, while pretrained models fit broader generation tasks. Llama 3.1 outputs can also be used to improve other models (e.g., synthetic data and refinement). It is an autoregressive Transformer model, with SFT and RLHF to align for helpfulness and safety.",
"meta.llama3-70b-instruct-v1:0.description": "Meta Llama 3 is an open LLM for developers, researchers, and enterprises, designed to help them build, experiment, and responsibly scale generative AI ideas. As part of the foundation for global community innovation, it is well suited for content creation, conversational AI, language understanding, R&D, and enterprise applications.",
"meta.llama3-8b-instruct-v1:0.description": "Meta Llama 3 is an open LLM for developers, researchers, and enterprises, designed to help them build, experiment, and responsibly scale generative AI ideas. As part of the foundation for global community innovation, it is well suited to limited compute and resources, edge devices, and faster training times.",
"meta/Llama-3.2-11B-Vision-Instruct.description": "Strong image reasoning on high-resolution images, suited for visual understanding apps.",
"meta/Llama-3.2-90B-Vision-Instruct.description": "Advanced image reasoning for visual-understanding agent applications.",
"meta/Llama-3.3-70B-Instruct.description": "Llama 3.3 is the most advanced multilingual open-source Llama model, delivering near-405B performance at very low cost. It is Transformer-based and improved with SFT and RLHF for usefulness and safety. The instruction-tuned version is optimized for multilingual chat and beats many open and closed chat models on industry benchmarks. Knowledge cutoff: Dec 2023.",
"meta/Meta-Llama-3-70B-Instruct.description": "A powerful 70B-parameter model that excels at reasoning, coding, and broad language tasks.",
"meta/Meta-Llama-3-8B-Instruct.description": "A versatile 8B-parameter model optimized for chat and text generation.",
"meta/Meta-Llama-3.1-405B-Instruct.description": "Llama 3.1 instruction-tuned text model optimized for multilingual chat, performing strongly on common industry benchmarks among open and closed chat models.",
"meta/Meta-Llama-3.1-70B-Instruct.description": "Llama 3.1 instruction-tuned text model optimized for multilingual chat, performing strongly on common industry benchmarks among open and closed chat models.",
"meta/Meta-Llama-3.1-8B-Instruct.description": "Llama 3.1 instruction-tuned text model optimized for multilingual chat, performing strongly on common industry benchmarks among open and closed chat models.",
"meta/llama-3.1-405b-instruct.description": "An advanced LLM supporting synthetic data generation, knowledge distillation, and reasoning for chatbots, coding, and domain tasks.",
"meta/llama-3.1-70b-instruct.description": "Built for complex dialogue with excellent context understanding, reasoning, and text generation.",
"meta/llama-3.1-70b.description": "An updated Meta Llama 3 70B Instruct with 128K context, multilingual support, and improved reasoning.",
"meta/llama-3.1-8b-instruct.description": "A cutting-edge model with strong language understanding, reasoning, and text generation.",
"meta/llama-3.1-8b.description": "Llama 3.1 8B supports a 128K context window, ideal for real-time chat and data analysis, and offers significant cost savings versus larger models. Served by Groq on LPU hardware for fast, efficient inference.",
"meta/llama-3.2-11b-vision-instruct.description": "A frontier vision-language model that excels at high-quality reasoning from images.",
"meta/llama-3.2-11b.description": "An instruction-tuned image reasoning model (text+image input, text output) optimized for visual recognition, image reasoning, captioning, and general image QA.",
"meta/llama-3.2-1b-instruct.description": "A cutting-edge small language model with strong understanding, reasoning, and text generation.",
"meta/llama-3.2-1b.description": "Text-only model for on-device use cases like multilingual local retrieval, summarization, and rewriting.",
"meta/llama-3.2-3b-instruct.description": "A cutting-edge small language model with strong understanding, reasoning, and text generation.",
"meta/llama-3.2-3b.description": "Text-only model fine-tuned for on-device use cases like multilingual local retrieval, summarization, and rewriting.",
"meta/llama-3.2-90b-vision-instruct.description": "A frontier vision-language model that excels at high-quality reasoning from images.",
"meta/llama-3.2-90b.description": "An instruction-tuned image reasoning model (text+image input, text output) optimized for visual recognition, image reasoning, captioning, and general image QA.",
"meta/llama-3.3-70b-instruct.description": "An advanced LLM strong at reasoning, math, common sense, and function calling.",
"meta/llama-3.3-70b.description": "A perfect balance of performance and efficiency. Built for high-performance conversational AI in content creation, enterprise apps, and research, with strong language understanding for summarization, classification, sentiment, and code generation.",
"meta/llama-4-maverick.description": "The Llama 4 family is a native multimodal AI model set supporting text and multimodal experiences, using MoE for leading text and image understanding. Llama 4 Maverick is a 17B model with 128 experts, served by DeepInfra.",
"meta/llama-4-scout.description": "The Llama 4 family is a native multimodal AI model set supporting text and multimodal experiences, using MoE for leading text and image understanding. Llama 4 Scout is a 17B model with 16 experts, served by DeepInfra.",
"microsoft/Phi-3-medium-128k-instruct.description": "The same Phi-3-medium model with a larger context window for RAG or few-shot prompts.",
"microsoft/Phi-3-medium-4k-instruct.description": "A 14B-parameter model with higher quality than Phi-3-mini, focused on high-quality, reasoning-intensive data.",
"microsoft/Phi-3-mini-128k-instruct.description": "The same Phi-3-mini model with a larger context window for RAG or few-shot prompts.",
"microsoft/Phi-3-mini-4k-instruct.description": "The smallest Phi-3 family member, optimized for quality and low latency.",
"microsoft/Phi-3-small-128k-instruct.description": "The same Phi-3-small model with a larger context window for RAG or few-shot prompts.",
"microsoft/Phi-3-small-8k-instruct.description": "A 7B-parameter model with higher quality than Phi-3-mini, focused on high-quality, reasoning-intensive data.",
"microsoft/Phi-3.5-mini-instruct.description": "An updated version of the Phi-3-mini model.",
"microsoft/Phi-3.5-vision-instruct.description": "An updated version of the Phi-3-vision model.",
"microsoft/WizardLM-2-8x22B.description": "WizardLM 2 is a language model from Microsoft AI that excels at complex dialogue, multilingual tasks, reasoning, and assistants.",
"microsoft/wizardlm-2-8x22b.description": "WizardLM-2 8x22B is Microsoft AIs most advanced Wizard model with highly competitive performance.",
"mimo-v2-flash.description": "MiMo-V2-Flash is now officially open source! This is a MoE (Mixture-of-Experts) model purpose-built for extreme inference efficiency, with 309B total parameters (15B activated). Through innovations in a hybrid attention architecture and multi-layer MTP inference acceleration, it ranks among the global Top 2 open-source models across multiple agent benchmarking suites. Its coding capabilities surpass all open-source models and rival leading closed-source models such as Claude 4.5 Sonnet, while incurring only 2.5% of the inference cost and delivering 2× faster generation speed—pushing large-model inference efficiency to the limit.",
"mimo-v2-omni.description": "MiMo-V2-Omni is purpose-built for complex multimodal interaction and execution scenarios in the real world. We constructed a full-modality foundation from the ground up, integrating text, vision, and speech, and unified “perception” and “action” within a single architecture. This not only breaks the traditional limitation of models that emphasize understanding over execution, but also endows the model with native capabilities in multimodal perception, tool usage, function execution, and GUI operations. MiMo-V2-Omni can seamlessly integrate with major agent frameworks, achieving a leap from understanding to control while significantly lowering the barrier to deploying fully multimodal agents.",
"mimo-v2-pro.description": "Xiaomi MiMo-V2-Pro is specifically designed for high-intensity agent workflows in real-world scenarios. It features over 1 trillion total parameters (42B activated parameters), adopts an innovative hybrid attention architecture, and supports an ultra-long context length of up to 1 million tokens. Built on a powerful foundational model, we continuously scale computational resources across a broader range of agent scenarios, further expanding the action space of intelligence and achieving significant generalization—from coding to real-world task execution (“claw”).",
"minicpm-v.description": "MiniCPM-V is OpenBMBs next-generation multimodal model with excellent OCR and multimodal understanding for wide-ranging use cases.",
"minimax-m2.1.description": "MiniMax-M2.1 is the latest version of the MiniMax series, optimized for multilingual programming and real-world complex tasks. As an AI-native model, MiniMax-M2.1 achieves significant improvements in model performance, agent framework support, and multi-scenario adaptation, aiming to help enterprises and individuals find AI-native work and lifestyle more quickly.",
"minimax-m2.5.description": "MiniMax-M2.5 is a state-of-the-art large language model designed for real-world productivity and coding tasks.",
"minimax-m2.description": "MiniMax M2 is an efficient large language model built specifically for coding and agent workflows.",
"minimax/minimax-m2.1.description": "MiniMax-M2.1 is a lightweight, cutting-edge large language model optimized for coding, proxy workflows, and modern application development, providing cleaner, more concise output and faster perceptual response times.",
"minimax/minimax-m2.description": "MiniMax-M2 is a high-value model that excels at coding and agent tasks for many engineering scenarios.",
"minimaxai/minimax-m2.1.description": "MiniMax-M2.1 is a compact, fast, cost-effective MoE model built for top-tier coding and agent performance.",
"minimaxai/minimax-m2.5.description": "MiniMax-M2.5 is the latest large language model from MiniMax, featuring a Mixture-of-Experts (MoE) architecture with 229 billion total parameters. It achieves industry-leading performance in programming, agent tool calling, search tasks, and office scenarios.",
"minimaxai/minimax-m2.description": "MiniMax-M2 is a compact, fast, cost-effective MoE model (230B total, 10B active) built for top-tier coding and agent performance while retaining strong general intelligence. It excels at multi-file edits, code-run-fix loops, test validation, and complex toolchains.",
"ministral-3b-latest.description": "Ministral 3B is Mistrals top-tier edge model.",
"ministral-8b-latest.description": "Ministral 8B is a highly cost-effective edge model from Mistral.",
"mistral-ai/Mistral-Large-2411.description": "Mistrals flagship model for complex tasks needing large-scale reasoning or specialization (synthetic text generation, code generation, RAG, or agents).",
"mistral-ai/Mistral-Nemo.description": "Mistral Nemo is a cutting-edge LLM with state-of-the-art reasoning, world knowledge, and coding for its size.",
"mistral-ai/mistral-small-2503.description": "Mistral Small is suitable for any language-based task requiring high efficiency and low latency.",
"mistral-large-instruct.description": "Mistral-Large-Instruct-2407 is an advanced dense LLM with 123B parameters and state-of-the-art reasoning, knowledge, and coding.",
"mistral-large-latest.description": "Mistral Large is the flagship model, strong in multilingual tasks, complex reasoning, and code generation—ideal for high-end applications.",
"mistral-large.description": "Mixtral Large is Mistrals flagship model, combining code generation, math, and reasoning with a 128K context window.",
"mistral-medium-latest.description": "Mistral Medium 3.1 delivers state-of-the-art performance at 8× lower cost and simplifies enterprise deployment.",
"mistral-nemo-instruct.description": "Mistral-Nemo-Instruct-2407 is the instruction-tuned version of Mistral-Nemo-Base-2407.",
"mistral-nemo.description": "Mistral Nemo is a high-efficiency 12B model from Mistral AI and NVIDIA.",
"mistral-small-latest.description": "Mistral Small is a cost-effective, fast, and reliable option for translation, summarization, and sentiment analysis.",
"mistral-small.description": "Mistral Small is suitable for any language-based task requiring high efficiency and low latency.",
"mistral.description": "Mistral is Mistral AIs 7B model, suitable for varied language tasks.",
"mistral/codestral-embed.description": "A code embedding model for embedding codebases and repositories to support coding assistants.",
"mistral/codestral.description": "Mistral Codestral 25.01 is a state-of-the-art coding model optimized for low latency and high-frequency use. It supports 80+ languages and excels at FIM, code correction, and test generation.",
"mistral/devstral-small.description": "Devstral is an agentic LLM for software engineering tasks, making it a strong choice for software engineering agents.",
"mistral/magistral-medium.description": "Complex thinking supported by deep understanding with transparent reasoning you can follow and verify. It maintains high-fidelity reasoning across languages, even mid-task.",
"mistral/magistral-small.description": "Complex thinking supported by deep understanding with transparent reasoning you can follow and verify. It maintains high-fidelity reasoning across languages, even mid-task.",
"mistral/ministral-3b.description": "A compact, efficient model for on-device tasks like assistants and local analytics, delivering low-latency performance.",
"mistral/ministral-8b.description": "A more powerful model with faster, memory-efficient inference, ideal for complex workflows and demanding edge applications.",
"mistral/mistral-embed.description": "A general text embedding model for semantic search, similarity, clustering, and RAG workflows.",
"mistral/mistral-large.description": "Mistral Large is ideal for complex tasks that require strong reasoning or specialization—synthetic text generation, code generation, RAG, or agents.",
"mistral/mistral-small.description": "Mistral Small is ideal for simple, batchable tasks like classification, customer support, or text generation, delivering great performance at an affordable price.",
"mistral/mixtral-8x22b-instruct.description": "8x22B Instruct model. 8x22B is an open MoE model served by Mistral.",
"mistral/pixtral-12b.description": "A 12B model with image understanding and text.",
"mistral/pixtral-large.description": "Pixtral Large is the second model in our multimodal family with frontier-level image understanding. It handles documents, charts, and natural images while retaining Mistral Large 2s leading text understanding.",
"mistralai/Mistral-7B-Instruct-v0.1.description": "Mistral (7B) Instruct is known for strong performance across many language tasks.",
"mistralai/Mistral-7B-Instruct-v0.2.description": "Mistral (7B) Instruct v0.2 improves instruction handling and result accuracy.",
"mistralai/Mistral-7B-Instruct-v0.3.description": "Mistral (7B) Instruct v0.3 offers efficient compute and strong language understanding for many use cases.",
"mistralai/Mistral-7B-v0.1.description": "Mistral 7B is compact but high-performing, strong for batch processing and simple tasks like classification and text generation, with solid reasoning.",
"mistralai/Mixtral-8x22B-Instruct-v0.1.description": "Mixtral-8x22B Instruct (141B) is a very large LLM for heavy workloads.",
"mistralai/Mixtral-8x7B-Instruct-v0.1.description": "Mixtral-8x7B Instruct (46.7B) provides high capacity for large-scale data processing.",
"mistralai/Mixtral-8x7B-v0.1.description": "Mixtral 8x7B is a sparse MoE model that boosts inference speed, suitable for multilingual and code generation tasks.",
"mistralai/mistral-nemo.description": "Mistral Nemo is a 7.3B model with multilingual support and strong coding performance.",
"mixtral-8x7b-32768.description": "Mixtral 8x7B provides fault-tolerant parallel compute for complex tasks.",
"mixtral.description": "Mixtral is Mistral AIs MoE model with open weights, supporting code generation and language understanding.",
"mixtral:8x22b.description": "Mixtral is Mistral AIs MoE model with open weights, supporting code generation and language understanding.",
"moonshot-v1-128k-vision-preview.description": "Kimi vision models (including moonshot-v1-8k-vision-preview/moonshot-v1-32k-vision-preview/moonshot-v1-128k-vision-preview) can understand image content such as text, colors, and object shapes.",
"moonshot-v1-128k.description": "Moonshot V1 128K provides ultra-long context for very long text generation, handling up to 128,000 tokens for research, academic, and large-document scenarios.",
"moonshot-v1-32k-vision-preview.description": "Kimi vision models (including moonshot-v1-8k-vision-preview/moonshot-v1-32k-vision-preview/moonshot-v1-128k-vision-preview) can understand image content such as text, colors, and object shapes.",
"moonshot-v1-32k.description": "Moonshot V1 32K supports 32,768 tokens for medium-length context, ideal for long documents and complex dialogues in content creation, reports, and chat systems.",
"moonshot-v1-8k-vision-preview.description": "Kimi vision models (including moonshot-v1-8k-vision-preview/moonshot-v1-32k-vision-preview/moonshot-v1-128k-vision-preview) can understand image content such as text, colors, and object shapes.",
"moonshot-v1-8k.description": "Moonshot V1 8K is optimized for short text generation with efficient performance, handling 8,192 tokens for short chats, notes, and quick content.",
"moonshotai/Kimi-Dev-72B.description": "Kimi-Dev-72B is an open-source code LLM optimized with large-scale RL to produce robust, production-ready patches. It scores 60.4% on SWE-bench Verified, setting a new open-model record for automated software engineering tasks like bug fixing and code review.",
"moonshotai/Kimi-K2-Instruct-0905.description": "Kimi K2-Instruct-0905 is the newest and most powerful Kimi K2. It is a top-tier MoE model with 1T total and 32B active parameters. Key features include stronger agentic coding intelligence with significant gains on benchmarks and real-world agent tasks, plus improved frontend coding aesthetics and usability.",
"moonshotai/Kimi-K2-Thinking.description": "Kimi K2 Thinking is the latest and most powerful open-source thinking model. It greatly extends multi-step reasoning depth and sustains stable tool use across 200300 consecutive calls, setting new records on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks. 'It excels in coding, math, logic, and agent scenarios. Built on an MoE architecture with ~1T total parameters, it supports a 256K context window and tool calling.",
"moonshotai/kimi-k2-0711.description": "Kimi K2 0711 is the instruct variant in the Kimi series, suited for high-quality code and tool use.",
"moonshotai/kimi-k2-0905.description": "Kimi K2 0905 is an update that expands context and reasoning performance with coding optimizations.",
"moonshotai/kimi-k2-instruct-0905.description": "The kimi-k2-0905-preview model supports a 256k context window, with stronger agentic coding, more polished and practical frontend code, and better context understanding.",
"moonshotai/kimi-k2-thinking-turbo.description": "Kimi K2 Thinking Turbo is a high-speed version of Kimi K2 Thinking, significantly lowering latency while retaining deep reasoning.",
"moonshotai/kimi-k2-thinking.description": "Kimi K2 Thinking is Moonshots reasoning model optimized for deep reasoning tasks, with general agent capabilities.",
"moonshotai/kimi-k2.5.description": "Kimi K2.5 is the most intelligent Kimi model to date, featuring native multimodal architecture.",
"moonshotai/kimi-k2.description": "Kimi K2 is a large MoE model from Moonshot AI with 1T total parameters and 32B active per forward pass, optimized for agent capabilities including advanced tool use, reasoning, and code synthesis.",
"morph/morph-v3-fast.description": "Morph provides a specialized model to apply code changes suggested by frontier models (e.g., Claude or GPT-4o) to your existing files at FAST 4500+ tokens/sec. It is the final step in an AI coding workflow and supports 16k input/output tokens.",
"morph/morph-v3-large.description": "Morph provides a specialized model to apply code changes suggested by frontier models (e.g., Claude or GPT-4o) to your existing files at FAST 2500+ tokens/sec. It is the final step in an AI coding workflow and supports 16k input/output tokens.",
"musesteamer-air-image.description": "musesteamer-air-image is an image-generation model developed by Baidus search team to deliver exceptional cost-performance. It can quickly generate clear, action-coherent images based on user prompts, turning user descriptions effortlessly into visuals.",
"nousresearch/hermes-2-pro-llama-3-8b.description": "Hermes 2 Pro Llama 3 8B is an updated Nous Hermes 2 version with the latest internally developed datasets.",
"nvidia/Llama-3.1-Nemotron-70B-Instruct-HF.description": "Llama 3.1 Nemotron 70B is an NVIDIA-customized LLM to improve helpfulness. It performs strongly on Arena Hard, AlpacaEval 2 LC, and GPT-4-Turbo MT-Bench, ranking #1 on all three auto-alignment benchmarks as of Oct 1, 2024. It is trained from Llama-3.1-70B-Instruct using RLHF (REINFORCE), Llama-3.1-Nemotron-70B-Reward, and HelpSteer2-Preference prompts.",
"nvidia/llama-3.1-nemotron-51b-instruct.description": "A distinctive language model delivering exceptional accuracy and efficiency.",
"nvidia/llama-3.1-nemotron-70b-instruct.description": "Llama-3.1-Nemotron-70B-Instruct is a custom NVIDIA model designed to improve the helpfulness of LLM responses.",
"o1-mini.description": "o1-mini is a fast, cost-effective reasoning model designed for coding, math, and science. It has 128K context and an October 2023 knowledge cutoff.",
"o1-preview.description": "o1 is OpenAIs new reasoning model for complex tasks requiring broad knowledge. It has 128K context and an October 2023 knowledge cutoff.",
"o1-pro.description": "The o1 series is trained with reinforcement learning to think before answering and handle complex reasoning. o1-pro uses more compute for deeper thinking and consistently higher-quality answers.",
"o1.description": "o1 is OpenAIs new reasoning model with text+image input and text output, suited for complex tasks requiring broad knowledge. It has a 200K context window and an October 2023 knowledge cutoff.",
"o3-2025-04-16.description": "o3 is OpenAIs new reasoning model with text+image input and text output for complex tasks requiring broad knowledge.",
"o3-deep-research.description": "o3-deep-research is our most advanced deep research model for complex multi-step tasks. It can search the web and access your data via MCP connectors.",
"o3-mini.description": "o3-mini is our latest small reasoning model, delivering higher intelligence at the same cost and latency targets as o1-mini.",
"o3-pro-2025-06-10.description": "o3 Pro is OpenAIs new reasoning model with text+image input and text output for complex tasks requiring broad knowledge.",
"o3-pro.description": "o3-pro uses more compute to think deeper and consistently deliver better answers; available only via the Responses API.",
"o3.description": "o3 is a powerful all-round model that sets a new bar for math, science, programming, and visual reasoning. It excels at technical writing and instruction following and can analyze text, code, and images for multi-step problems.",
"o4-mini-2025-04-16.description": "o4-mini is an OpenAI reasoning model with text+image input and text output, suited for complex tasks requiring broad knowledge, with a 200K context window.",
"o4-mini-deep-research.description": "o4-mini-deep-research is a faster, more affordable deep research model for complex multi-step research. It can search the web and also access your data via MCP connectors.",
"o4-mini.description": "o4-mini is the latest small o-series model, optimized for fast, effective reasoning with high efficiency in coding and vision tasks.",
"open-codestral-mamba.description": "Codestral Mamba is a Mamba 2 language model focused on code generation, supporting advanced coding and reasoning tasks.",
"open-mistral-7b.description": "Mistral 7B is compact but high-performing, strong for batch processing and simple tasks like classification and text generation, with solid reasoning.",
"open-mistral-nemo.description": "Mistral Nemo is a 12B model co-developed with Nvidia, offering strong reasoning and coding performance with easy integration.",
"open-mixtral-8x22b.description": "Mixtral 8x22B is a larger MoE model for complex tasks, offering strong reasoning and higher throughput.",
"open-mixtral-8x7b.description": "Mixtral 8x7B is a sparse MoE model that boosts inference speed, suitable for multilingual and code generation tasks.",
"openai/gpt-3.5-turbo-instruct.description": "Similar capabilities to GPT-3-era models, compatible with legacy completion endpoints rather than chat.",
"openai/gpt-3.5-turbo.description": "OpenAIs most capable and cost-effective GPT-3.5 model, optimized for chat but still strong on classic completions.",
"openai/gpt-4-turbo.description": "OpenAIs gpt-4-turbo has broad general knowledge and domain expertise, follows complex natural-language instructions, and solves difficult problems accurately. Knowledge cutoff is April 2023 with a 128k context window.",
"openai/gpt-4.1-mini.description": "GPT-4.1 Mini offers lower latency and better value for mid-context workloads.",
"openai/gpt-4.1-nano.description": "GPT-4.1 Nano is an ultra-low-cost, low-latency option for high-frequency short chats or classification.",
"openai/gpt-4.1.description": "The GPT-4.1 series provides larger context windows and stronger engineering and reasoning capabilities.",
"openai/gpt-4o-mini.description": "GPT-4o-mini is a fast, small GPT-4o variant for low-latency multimodal use.",
"openai/gpt-4o.description": "The GPT-4o family is OpenAIs Omni model with text + image input and text output.",
"openai/gpt-5-chat.description": "GPT-5 Chat is a GPT-5 variant optimized for conversations with lower latency for better interactivity.",
"openai/gpt-5-codex.description": "GPT-5-Codex is a GPT-5 variant further optimized for coding and large-scale code workflows.",
"openai/gpt-5-mini.description": "GPT-5 Mini is a smaller GPT-5 variant for low-latency, low-cost scenarios.",
"openai/gpt-5-nano.description": "GPT-5 Nano is the ultra-small variant for scenarios with strict cost and latency constraints.",
"openai/gpt-5-pro.description": "GPT-5 Pro is OpenAIs flagship model, providing stronger reasoning, code generation, and enterprise-grade features, with test-time routing and stricter safety policies.",
"openai/gpt-5.1-chat.description": "GPT-5.1 Chat is the lightweight member of the GPT-5.1 family, optimized for low-latency conversations while retaining strong reasoning and instruction execution.",
"openai/gpt-5.1-codex-mini.description": "GPT-5.1-Codex-Mini is a smaller, faster version of GPT-5.1-Codex, better for latency- and cost-sensitive coding scenarios.",
"openai/gpt-5.1-codex.description": "GPT-5.1-Codex is a GPT-5.1 variant optimized for software engineering and coding workflows, suitable for large refactors, complex debugging, and long autonomous coding tasks.",
"openai/gpt-5.1.description": "GPT-5.1 is the latest flagship in the GPT-5 series, with significant improvements over GPT-5 in general reasoning, instruction following, and conversational naturalness, suitable for broad tasks.",
"openai/gpt-5.2-chat.description": "GPT-5.2 Chat is the ChatGPT variant for experiencing the newest conversation improvements.",
"openai/gpt-5.2-pro.description": "GPT-5.2 Pro: a smarter, more precise GPT-5.2 variant (Responses API only), suited for harder problems and longer multi-turn reasoning.",
"openai/gpt-5.2.description": "GPT-5.2 is a flagship model for coding and agentic workflows with stronger reasoning and long-context performance.",
"openai/gpt-5.description": "GPT-5 is OpenAIs high-performance model for a wide range of production and research tasks.",
"openai/gpt-oss-120b.description": "A highly capable general-purpose LLM with strong, controllable reasoning.",
"openai/gpt-oss-20b.description": "A compact, open-weights language model optimized for low latency and resource-constrained environments, including local and edge deployments.",
"openai/o1-mini.description": "o1-mini is a fast, cost-effective reasoning model designed for coding, math, and science. It has 128K context and an October 2023 knowledge cutoff.",
"openai/o1-preview.description": "o1 is OpenAIs new reasoning model for complex tasks requiring broad knowledge. It has 128K context and an October 2023 knowledge cutoff.",
"openai/o1.description": "OpenAI o1 is a flagship reasoning model built for complex problems that require deep thinking, delivering strong reasoning and higher accuracy on multi-step tasks.",
"openai/o3-mini-high.description": "o3-mini (high reasoning) delivers higher intelligence at the same cost and latency targets as o1-mini.",
"openai/o3-mini.description": "o3-mini is OpenAIs latest small reasoning model, delivering higher intelligence at the same cost and latency targets as o1-mini.",
"openai/o3.description": "OpenAI o3 is the most powerful reasoning model, setting new SOTA in coding, math, science, and visual perception. It excels at complex, multi-faceted queries and is particularly strong at analyzing images, charts, and diagrams.",
"openai/o4-mini-high.description": "o4-mini high reasoning tier, optimized for fast, efficient reasoning with strong coding and vision performance.",
"openai/o4-mini.description": "OpenAI o4-mini is a small, efficient reasoning model for low-latency scenarios.",
"openai/text-embedding-3-large.description": "OpenAIs most capable embedding model for English and non-English tasks.",
"openai/text-embedding-3-small.description": "OpenAIs improved, higher-performance ada embedding model variant.",
"openai/text-embedding-ada-002.description": "OpenAIs legacy text embedding model.",
"openrouter/auto.description": "Based on context length, topic, and complexity, your request is routed to Llama 3 70B Instruct, Claude 3.5 Sonnet (self-moderated), or GPT-4o.",
"oswe-vscode-prime.description": "Raptor mini is a preview model optimized for code-related tasks.",
"oswe-vscode-secondary.description": "Raptor mini is a preview model optimized for code-related tasks.",
"paratera/deepseek-v3.2.description": "DeepSeek V3.2 is a model that strikes a balance between high computational efficiency and excellent reasoning and agent performance.",
"perplexity/sonar-pro.description": "Perplexitys flagship product with search grounding, supporting advanced queries and follow-ups.",
"perplexity/sonar-reasoning-pro.description": "An advanced reasoning-focused model that outputs CoT with enhanced search, including multiple search queries per request.",
"perplexity/sonar-reasoning.description": "A reasoning-focused model that outputs chain-of-thought (CoT) with detailed, search-grounded explanations.",
"perplexity/sonar.description": "Perplexitys lightweight product with search grounding, faster and cheaper than Sonar Pro.",
"phi3.description": "Phi-3 is Microsofts lightweight open model for efficient integration and large-scale reasoning.",
"phi3:14b.description": "Phi-3 is Microsofts lightweight open model for efficient integration and large-scale reasoning.",
"pixtral-12b-2409.description": "Pixtral is strong at chart/image understanding, document QA, multimodal reasoning, and instruction following. It ingests images at native resolution/aspect ratio and handles any number of images within a 128K context window.",
"pixtral-large-latest.description": "Pixtral Large is a 124B-parameter open multimodal model built on Mistral Large 2, the second in our multimodal family with frontier-level image understanding.",
"pro-128k.description": "Spark Pro 128K provides a very large context capacity, handling up to 128K context, ideal for long-form documents requiring full-text analysis and long-range coherence, with smooth logic and diverse citation support in complex discussions.",
"pro-deepseek-r1.description": "Enterprise dedicated service model with bundled concurrency.",
"pro-deepseek-v3.description": "Enterprise dedicated service model with bundled concurrency.",
"qianfan-70b.description": "Qianfan 70B is a large Chinese model for high-quality generation and complex reasoning.",
"qianfan-8b.description": "Qianfan 8B is a mid-size general model balancing cost and quality for text generation and QA.",
"qianfan-agent-intent-32k.description": "Qianfan Agent Intent 32K targets intent recognition and agent orchestration with long context support.",
"qianfan-agent-lite-8k.description": "Qianfan Agent Lite 8K is a lightweight agent model for low-cost multi-turn dialogue and workflows.",
"qianfan-check-vl.description": "Qianfan Check VL is a multimodal content review model for image-text compliance and recognition tasks.",
"qianfan-composition.description": "Qianfan Composition is a multimodal creation model for mixed image-text understanding and generation.",
"qianfan-engcard-vl.description": "Qianfan EngCard VL is a multimodal recognition model focused on English scenarios.",
"qianfan-llama-vl-8b.description": "Qianfan Llama VL 8B is a Llama-based multimodal model for general image-text understanding.",
"qianfan-multipicocr.description": "Qianfan MultiPicOCR is a multi-image OCR model for text detection and recognition across images.",
"qianfan-qi-vl.description": "Qianfan QI VL is a multimodal QA model for accurate retrieval and QA in complex image-text scenarios.",
"qianfan-singlepicocr.description": "Qianfan SinglePicOCR is a single-image OCR model with high-accuracy character recognition.",
"qianfan-vl-70b.description": "Qianfan VL 70B is a large VLM for complex image-text understanding.",
"qianfan-vl-8b.description": "Qianfan VL 8B is a lightweight VLM for daily image-text QA and analysis.",
"qvq-72b-preview.description": "QVQ-72B-Preview is an experimental research model from Qwen focused on improving visual reasoning.",
"qvq-max.description": "Qwen QVQ visual reasoning model supports vision input and chain-of-thought output, with stronger performance in math, coding, visual analysis, creative, and general tasks.",
"qvq-plus.description": "Visual reasoning model with vision input and chain-of-thought output. The qvq-plus series follows qvq-max and offers faster reasoning with a better quality-cost balance.",
"qwen-coder-plus.description": "Qwen code model.",
"qwen-coder-turbo-latest.description": "Qwen code model.",
"qwen-coder-turbo.description": "Qwen code model.",
"qwen-flash.description": "Fastest and lowest-cost Qwen model, ideal for simple tasks.",
"qwen-image-2.0-pro.description": "The Qwen-Image-2.0 series full-version model integrates image generation and image editing into a unified capability. It supports more professional text rendering with up to 1k token instruction capacity, delivers more delicate and realistic visual textures, enables fine-grained depiction of realistic scenes, and demonstrates stronger semantic alignment with prompts. The full-version model provides the strongest text rendering capability and the highest level of realism within the 2.0 series.",
"qwen-image-2.0.description": "The Qwen-Image-2.0 series accelerated version model integrates image generation and image editing into a unified capability. It supports more professional text rendering with up to 1k token instruction capacity, provides more refined and realistic visual textures, enables fine-grained depiction of realistic scenes, and demonstrates stronger semantic adherence to prompts. The accelerated version effectively achieves the optimal balance between model quality and performance.",
"qwen-image-edit-max.description": "Qwen Image Editing Model supports multi-image input and multi-image output, enabling precise in-image text editing, object addition, removal, or relocation, subject action modification, image style transfer, and enhanced visual detail.",
"qwen-image-edit-plus.description": "Qwen Image Editing Model supports multi-image input and multi-image output, enabling precise in-image text editing, object addition, removal, or relocation, subject action modification, image style transfer, and enhanced visual detail.",
"qwen-image-edit.description": "Qwen Image Edit is an image-to-image model that edits images based on input images and text prompts, enabling precise adjustments and creative transformations.",
"qwen-image-max.description": "Qwen Image Generation Model (Max series) delivers enhanced realism and visual naturalness compared with the Plus series, effectively reducing AI-generated artifacts, and demonstrating outstanding performance in human appearance, texture details, and text rendering.",
"qwen-image-plus.description": "It supports a wide range of artistic styles and is particularly proficient at rendering complex text within images, enabling integrated imagetext layout design.",
"qwen-image.description": "Qwen-Image is a general image generation model supporting multiple art styles and strong complex text rendering, especially Chinese and English. It supports multi-line layouts, paragraph-level text, and fine detail for complex text-image layouts.",
"qwen-long.description": "Ultra-large Qwen model with long context and chat across long- and multi-document scenarios.",
"qwen-math-plus-latest.description": "Qwen Math is a language model specialized for solving math problems.",
"qwen-math-plus.description": "Qwen Math is a language model specialized for solving math problems.",
"qwen-math-turbo-latest.description": "Qwen Math is a language model specialized for solving math problems.",
"qwen-math-turbo.description": "Qwen Math is a language model specialized for solving math problems.",
"qwen-max.description": "Hundred-billion-scale ultra-large Qwen model supporting Chinese, English, and other languages; the API model behind current Qwen2.5 products.",
"qwen-omni-turbo.description": "Qwen-Omni models support multimodal inputs (video, audio, images, text) and output audio and text.",
"qwen-plus.description": "Enhanced ultra-large Qwen model supporting Chinese, English, and other languages.",
"qwen-turbo.description": "Qwen Turbo will no longer be updated; replace it with Qwen Flash. Ultra-large Qwen model supporting Chinese, English, and other languages.",
"qwen-vl-chat-v1.description": "Qwen VL supports flexible interactions including multi-image input, multi-turn QA, and creative tasks.",
"qwen-vl-max-latest.description": "Ultra-large Qwen vision-language model. Compared to the enhanced version, it further improves visual reasoning and instruction following for stronger perception and cognition.",
"qwen-vl-max.description": "Ultra-large Qwen vision-language model. Compared to the enhanced version, it further improves visual reasoning and instruction following for stronger visual perception and cognition.",
"qwen-vl-ocr.description": "Qwen OCR is a text extraction model for documents, tables, exam images, and handwriting. It supports Chinese, English, French, Japanese, Korean, German, Russian, Italian, Vietnamese, and Arabic.",
"qwen-vl-plus-latest.description": "Enhanced large-scale Qwen vision-language model with major gains in detail and text recognition, supporting over one-megapixel resolution and arbitrary aspect ratios.",
"qwen-vl-plus.description": "Enhanced large-scale Qwen vision-language model with major gains in detail and text recognition, supporting over one-megapixel resolution and arbitrary aspect ratios.",
"qwen-vl-v1.description": "Pretrained model initialized from Qwen-7B with an added vision module and 448 image resolution input.",
"qwen/qwen-2-7b-instruct.description": "Qwen2 is the new Qwen LLM series. Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capability, programming, math, and reasoning.",
"qwen/qwen-2-7b-instruct:free.description": "Qwen2 is a new large language model family with stronger understanding and generation.",
"qwen/qwen-2-vl-72b-instruct.description": "Qwen2-VL is the latest iteration of Qwen-VL, reaching state-of-the-art performance on vision benchmarks such as MathVista, DocVQA, RealWorldQA, and MTVQA. It can understand 20+ minutes of video for high-quality video Q&A, dialogue, and content creation. It also handles complex reasoning and decision-making, integrating with mobile devices and robots to act based on visual context and text instructions. Beyond English and Chinese, it also reads text in images across many languages, including most European languages, Japanese, Korean, Arabic, and Vietnamese.",
"qwen/qwen-2.5-72b-instruct.description": "Qwen2.5-72B-Instruct is one of Alibaba Clouds latest LLM releases. The 72B model brings notable improvements in coding and math, supports over 29 languages (including Chinese and English), and significantly improves instruction following, structured data understanding, and structured output (especially JSON).",
"qwen/qwen2.5-32b-instruct.description": "Qwen2.5-32B-Instruct is one of Alibaba Clouds latest LLM releases. The 32B model brings notable improvements in coding and math, supports over 29 languages (including Chinese and English), and significantly improves instruction following, structured data understanding, and structured output (especially JSON).",
"qwen/qwen2.5-7b-instruct.description": "A bilingual LLM for Chinese and English across language, coding, math, and reasoning.",
"qwen/qwen2.5-coder-32b-instruct.description": "An advanced LLM for code generation, reasoning, and repair across mainstream programming languages.",
"qwen/qwen2.5-coder-7b-instruct.description": "A strong mid-sized code model with 32K context, excelling at multilingual programming.",
"qwen/qwen3-14b.description": "Qwen3-14B is the 14B variant for general reasoning and chat scenarios.",
"qwen/qwen3-14b:free.description": "Qwen3-14B is a dense 14.8B-parameter causal LLM built for complex reasoning and efficient chat. It switches between a thinking mode for math, coding, and logic and a non-thinking mode for general chat. Fine-tuned for instruction following, agent tool use, and creative writing across 100+ languages and dialects. It natively handles 32K context and scales to 131K with YaRN.",
"qwen/qwen3-235b-a22b-2507.description": "Qwen3-235B-A22B-Instruct-2507 is the Instruct variant in the Qwen3 series, balancing multilingual instruction use with long-context scenarios.",
"qwen/qwen3-235b-a22b-thinking-2507.description": "Qwen3-235B-A22B-Thinking-2507 is the Thinking variant of Qwen3, strengthened for complex math and reasoning tasks.",
"qwen/qwen3-235b-a22b.description": "Qwen3-235B-A22B is a 235B-parameter MoE model from Qwen with 22B active per forward pass. It switches between a thinking mode for complex reasoning, math, and code and a non-thinking mode for efficient chat. It offers strong reasoning, multilingual support (100+ languages/dialects), advanced instruction following, and agent tool use. It natively handles 32K context and scales to 131K with YaRN.",
"qwen/qwen3-235b-a22b:free.description": "Qwen3-235B-A22B is a 235B-parameter MoE model from Qwen with 22B active per forward pass. It switches between a thinking mode for complex reasoning, math, and code and a non-thinking mode for efficient chat. It offers strong reasoning, multilingual support (100+ languages/dialects), advanced instruction following, and agent tool use. It natively handles 32K context and scales to 131K with YaRN.",
"qwen/qwen3-30b-a3b.description": "Qwen3 is the latest Qwen LLM generation with dense and MoE architectures, excelling at reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch between a thinking mode for complex reasoning and a non-thinking mode for efficient chat ensures versatile, high-quality performance.\n\nQwen3 significantly outperforms prior models like QwQ and Qwen2.5, delivering excellent math, coding, commonsense reasoning, creative writing, and interactive chat. The Qwen3-30B-A3B variant has 30.5B parameters (3.3B active), 48 layers, 128 experts (8 active per task), and supports up to 131K context with YaRN, setting a new bar for open models.",
"qwen/qwen3-30b-a3b:free.description": "Qwen3 is the latest Qwen LLM generation with dense and MoE architectures, excelling at reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch between a thinking mode for complex reasoning and a non-thinking mode for efficient chat ensures versatile, high-quality performance.\n\nQwen3 significantly outperforms prior models like QwQ and Qwen2.5, delivering excellent math, coding, commonsense reasoning, creative writing, and interactive chat. The Qwen3-30B-A3B variant has 30.5B parameters (3.3B active), 48 layers, 128 experts (8 active per task), and supports up to 131K context with YaRN, setting a new bar for open models.",
"qwen/qwen3-32b.description": "Qwen3-32B is a dense 32.8B-parameter causal LLM optimized for complex reasoning and efficient chat. It switches between a thinking mode for math, coding, and logic and a non-thinking mode for faster general chat. It performs strongly on instruction following, agent tool use, and creative writing across 100+ languages and dialects. It natively handles 32K context and scales to 131K with YaRN.",
"qwen/qwen3-32b:free.description": "Qwen3-32B is a dense 32.8B-parameter causal LLM optimized for complex reasoning and efficient chat. It switches between a thinking mode for math, coding, and logic and a non-thinking mode for faster general chat. It performs strongly on instruction following, agent tool use, and creative writing across 100+ languages and dialects. It natively handles 32K context and scales to 131K with YaRN.",
"qwen/qwen3-8b:free.description": "Qwen3-8B is a dense 8.2B-parameter causal LLM built for reasoning-heavy tasks and efficient chat. It switches between a thinking mode for math, coding, and logic and a non-thinking mode for general chat. Fine-tuned for instruction following, agent integration, and creative writing across 100+ languages and dialects. It natively supports 32K context and scales to 131K with YaRN.",
"qwen/qwen3-coder-plus.description": "Qwen3-Coder-Plus is a Qwen-series coding agent model optimized for more complex tool use and long-running sessions.",
"qwen/qwen3-coder.description": "Qwen3-Coder is the Qwen3 code-generation family, strong at long-document code understanding and generation.",
"qwen/qwen3-max-preview.description": "Qwen3 Max (preview) is the Max variant for advanced reasoning and tool integration.",
"qwen/qwen3-max.description": "Qwen3 Max is the high-end reasoning model in the Qwen3 series for multilingual reasoning and tool integration.",
"qwen/qwen3-vl-plus.description": "Qwen3 VL-Plus is the vision-enhanced Qwen3 variant with improved multimodal reasoning and video processing.",
"qwen/qwen3.5-122b-a10b.description": "Qwen3.5-122B-A10B is a native multimodal large language model developed by the Qwen team, with 122B total parameters and only 10B active parameters. The model employs a highly efficient hybrid architecture combining Gated Delta Networks with Sparse Mixture of Experts (MoE). It natively supports 256K context length, extendable to approximately 1 million tokens. Through early fusion training, the model achieves unified vision-language foundational capabilities, supporting text, image, and video understanding. It delivers excellent performance across multiple benchmarks including knowledge, reasoning, coding, agents, visual understanding, and multilingual tasks, surpassing GPT-5-mini and Qwen3-235B-A22B on several metrics. The model has Thinking Mode enabled by default, supports tool calling, and covers 201 languages and dialects.",
"qwen/qwen3.5-27b.description": "Qwen3.5-27B is a native multimodal large language model developed by the Qwen team with 27B parameters. The model employs a highly efficient hybrid architecture combining Gated Delta Networks with Gated Attention. It natively supports 256K context length, extendable to approximately 1 million tokens. Through early fusion training, the model achieves unified vision-language foundational capabilities, supporting text, image, and video understanding. It delivers excellent performance across multiple benchmarks including reasoning, coding, agents, and visual understanding, surpassing Qwen3-235B-A22B and GPT-5-mini on several metrics. The model has Thinking Mode enabled by default, supports tool calling, and covers 201 languages and dialects.",
"qwen/qwen3.5-35b-a3b.description": "Qwen3.5-35B-A3B is a native multimodal large language model developed by the Qwen team, with 35B total parameters and only 3B active parameters. The model employs a highly efficient hybrid architecture combining Gated Delta Networks with Sparse Mixture of Experts (MoE). It natively supports 256K context length, extendable to approximately 1 million tokens. Through early fusion training, the model achieves unified vision-language foundational capabilities, supporting text, image, and video understanding. It delivers excellent performance across multiple benchmarks including reasoning, coding, agents, and visual understanding. The model has Thinking Mode enabled by default, supports tool calling, and covers 201 languages and dialects.",
"qwen/qwen3.5-397b-a17b.description": "Qwen3.5-397B-A17B is the latest vision-language model in the Qwen series, featuring a Mixture of Experts (MoE) architecture with 397B total parameters and 17B active parameters. The model natively supports a 256K context length, extendable to approximately 1M tokens. It supports 201 languages and offers unified vision-language understanding capabilities, tool calling, and reasoning thinking modes.",
"qwen/qwen3.5-4b.description": "Qwen3.5-4B is a native multimodal large language model developed by the Qwen team with 4B parameters, making it the lightest Dense model in the Qwen3.5 series. The model employs a highly efficient hybrid architecture combining Gated Delta Networks with Gated Attention. It natively supports 256K context length, extendable to approximately 1 million tokens. Through early fusion training, the model achieves unified vision-language foundational capabilities, supporting text, image, and video understanding. It delivers excellent performance among models of similar size, surpassing GPT-5-Nano and Gemini-2.5-Flash-Lite on several metrics. The model has Thinking Mode enabled by default, supports tool calling, and covers 201 languages and dialects.",
"qwen/qwen3.5-9b.description": "Qwen3.5-9B is a native multimodal large language model developed by the Qwen team with 9B parameters. As the lightweight Dense model in the Qwen3.5 series, it employs a highly efficient hybrid architecture combining Gated Delta Networks with Gated Attention. It natively supports 256K context length, extendable to approximately 1 million tokens. Through early fusion training, the model achieves unified vision-language foundational capabilities, supporting text, image, and video understanding. The model has Thinking Mode enabled by default, supports tool calling, and covers 201 languages and dialects.",
"qwen2.5-14b-instruct-1m.description": "Qwen2.5 open-source 72B model.",
"qwen2.5-14b-instruct.description": "Qwen2.5 open-source 14B model.",
"qwen2.5-32b-instruct.description": "Qwen2.5 open-source 32B model.",
"qwen2.5-72b-instruct.description": "Qwen2.5 open-source 72B model.",
"qwen2.5-7b-instruct.description": "Qwen2.5 7B Instruct is a mature open-source instruct model for multi-scenario chat and generation.",
"qwen2.5-coder-1.5b-instruct.description": "Open-source Qwen code model.",
"qwen2.5-coder-14b-instruct.description": "Open-source Qwen code model.",
"qwen2.5-coder-32b-instruct.description": "Open-source Qwen code model.",
"qwen2.5-coder-7b-instruct.description": "Open-source Qwen code model.",
"qwen2.5-coder-instruct.description": "Qwen2.5-Coder is the latest code-focused LLM in the Qwen family (formerly CodeQwen).",
"qwen2.5-instruct.description": "Qwen2.5 is the latest Qwen LLM series, with base and instruction-tuned models ranging from 0.5B to 72B parameters.",
"qwen2.5-math-1.5b-instruct.description": "Qwen-Math delivers strong math problem-solving.",
"qwen2.5-math-72b-instruct.description": "Qwen-Math delivers strong math problem-solving.",
"qwen2.5-math-7b-instruct.description": "Qwen-Math delivers strong math problem-solving.",
"qwen2.5-omni-7b.description": "Qwen-Omni models support multimodal inputs (video, audio, images, text) and output audio and text.",
"qwen2.5-vl-32b-instruct.description": "Qwen2.5 VL 32B Instruct is an open-source multimodal model suitable for private deployment and multi-scenario use.",
"qwen2.5-vl-72b-instruct.description": "Improved instruction following, math, problem solving, and coding, with stronger general object recognition. Supports precise visual element localization across formats, long video understanding (up to 10 minutes) with second-level event timing, temporal ordering and speed understanding, and agents that can control OS or mobile via parsing and localization. Strong key info extraction and JSON output. This is the 72B, strongest version in the series.",
"qwen2.5-vl-7b-instruct.description": "Qwen2.5 VL 7B Instruct is a lightweight multimodal model balancing deployment cost and recognition ability.",
"qwen2.5-vl-instruct.description": "Qwen2.5-VL is the latest vision-language model in the Qwen family.",
"qwen2.5.description": "Qwen2.5 is Alibabas next-generation large language model with strong performance across diverse use cases.",
"qwen2.5:0.5b.description": "Qwen2.5 is Alibabas next-generation large language model with strong performance across diverse use cases.",
"qwen2.5:1.5b.description": "Qwen2.5 is Alibabas next-generation large language model with strong performance across diverse use cases.",
"qwen2.5:72b.description": "Qwen2.5 is Alibabas next-generation large language model with strong performance across diverse use cases.",
"qwen2.description": "Qwen2 is Alibabas next-generation large language model with strong performance across diverse use cases.",
"qwen2:0.5b.description": "Qwen2 is Alibabas next-generation large language model with strong performance across diverse use cases.",
"qwen2:1.5b.description": "Qwen2 is Alibabas next-generation large language model with strong performance across diverse use cases.",
"qwen2:72b.description": "Qwen2 is Alibabas next-generation large language model with strong performance across diverse use cases.",
"qwen3-0.6b.description": "Qwen3 0.6B is an entry-level model for simple reasoning and very constrained environments.",
"qwen3-1.7b.description": "Qwen3 1.7B is an ultra-light model for edge and device deployment.",
"qwen3-14b.description": "Qwen3 14B is a mid-size model for multilingual QA and text generation.",
"qwen3-235b-a22b-instruct-2507.description": "Qwen3 235B A22B Instruct 2507 is a flagship instruct model for a wide range of generation and reasoning tasks.",
"qwen3-235b-a22b-thinking-2507.description": "Qwen3 235B A22B Thinking 2507 is an ultra-large thinking model for hard reasoning.",
"qwen3-235b-a22b.description": "Qwen3 is a next-gen Tongyi Qwen model with major gains in reasoning, general ability, agent capabilities, and multilingual performance, and supports switching thinking modes.",
"qwen3-30b-a3b-instruct-2507.description": "Qwen3 30B A3B Instruct 2507 is a mid-large instruct model for high-quality generation and QA.",
"qwen3-30b-a3b-thinking-2507.description": "Qwen3 30B A3B Thinking 2507 is a mid-large thinking model balancing accuracy and cost.",
"qwen3-30b-a3b.description": "Qwen3 30B A3B is a mid-large general model balancing cost and quality.",
"qwen3-32b.description": "Qwen3 32B is suited for general tasks requiring stronger understanding.",
"qwen3-4b.description": "Qwen3 4B is suitable for small-to-mid apps and local inference.",
"qwen3-8b.description": "Qwen3 8B is a lightweight model with flexible deployment for high-concurrency workloads.",
"qwen3-coder-30b-a3b-instruct.description": "Open-source Qwen code model. The latest qwen3-coder-30b-a3b-instruct is based on Qwen3 and delivers strong coding-agent abilities, tool use, and environment interaction for autonomous programming, with excellent code performance and solid general capability.",
"qwen3-coder-480b-a35b-instruct.description": "Qwen3 Coder 480B A35B Instruct is a flagship code model for multilingual programming and complex code understanding.",
"qwen3-coder-flash.description": "Qwen code model. The latest Qwen3-Coder series is based on Qwen3 and delivers strong coding-agent abilities, tool use, and environment interaction for autonomous programming, with excellent code performance and solid general capability.",
"qwen3-coder-next.description": "Nextgen Qwen coder optimized for complex multi-file code generation, debugging, and highthroughput agent workflows. Designed for strong tool integration and improved reasoning performance.",
"qwen3-coder-plus.description": "Qwen code model. The latest Qwen3-Coder series is based on Qwen3 and delivers strong coding-agent abilities, tool use, and environment interaction for autonomous programming, with excellent code performance and solid general capability.",
"qwen3-coder:480b.description": "Alibaba's high-performance long-context model for agent and coding tasks.",
"qwen3-max-preview.description": "Best-performing Qwen model for complex, multi-step tasks. The preview supports thinking.",
"qwen3-max.description": "Qwen3 Max models deliver large gains over the 2.5 series in general ability, Chinese/English understanding, complex instruction following, subjective open tasks, multilingual ability, and tool use, with fewer hallucinations. The latest qwen3-max improves agentic programming and tool use over qwen3-max-preview. This release reaches field SOTA and targets more complex agent needs.",
"qwen3-next-80b-a3b-instruct.description": "Next-gen Qwen3 non-thinking open-source model. Compared to the prior version (Qwen3-235B-A22B-Instruct-2507), it has better Chinese understanding, stronger logical reasoning, and improved text generation.",
"qwen3-next-80b-a3b-thinking.description": "Qwen3 Next 80B A3B Thinking is a flagship reasoning model version for complex tasks.",
"qwen3-omni-flash.description": "Qwen-Omni accepts combined inputs across text, images, audio, and video, and outputs text or speech. It offers multiple natural voice styles, supports multilingual and dialect speech, and fits use cases like writing, vision recognition, and voice assistants.",
"qwen3-vl-235b-a22b-instruct.description": "Qwen3 VL 235B A22B Instruct is a flagship multimodal model for demanding understanding and creation.",
"qwen3-vl-235b-a22b-thinking.description": "Qwen3 VL 235B A22B Thinking is the flagship thinking version for complex multimodal reasoning and planning.",
"qwen3-vl-30b-a3b-instruct.description": "Qwen3 VL 30B A3B Instruct is a large multimodal model balancing accuracy and reasoning performance.",
"qwen3-vl-30b-a3b-thinking.description": "Qwen3 VL 30B A3B Thinking is a deep-thinking version for complex multimodal tasks.",
"qwen3-vl-32b-instruct.description": "Qwen3 VL 32B Instruct is a multimodal instruction-tuned model for high-quality image-text QA and creation.",
"qwen3-vl-32b-thinking.description": "Qwen3 VL 32B Thinking is a deep-thinking multimodal version for complex reasoning and long-chain analysis.",
"qwen3-vl-8b-instruct.description": "Qwen3 VL 8B Instruct is a lightweight multimodal model for daily visual QA and app integration.",
"qwen3-vl-8b-thinking.description": "Qwen3 VL 8B Thinking is a multimodal chain-of-thought model for detailed visual reasoning.",
"qwen3-vl-flash.description": "Qwen3 VL Flash: lightweight, high-speed reasoning version for latency-sensitive or high-volume requests.",
"qwen3-vl-plus.description": "Qwen VL is a text generation model with vision understanding. It can do OCR and also summarize and reason, such as extracting attributes from product photos or solving problems from images.",
"qwen3.5-122b-a10b.description": "Supports text, image, and video inputs. For text-only tasks, its performance is comparable to Qwen3 Max, offering higher efficiency and lower cost. In multimodal capabilities, it delivers significant improvements over the Qwen3 VL series.",
"qwen3.5-27b.description": "Supports text, image, and video inputs. For text-only tasks, its performance is comparable to Qwen3 Max, offering higher efficiency and lower cost. In multimodal capabilities, it delivers significant improvements over the Qwen3 VL series.",
"qwen3.5-35b-a3b.description": "Supports text, image, and video inputs. For text-only tasks, its performance is comparable to Qwen3 Max, offering higher efficiency and lower cost. In multimodal capabilities, it delivers significant improvements over the Qwen3 VL series.",
"qwen3.5-397b-a17b.description": "Supports text, image, and video inputs. For text-only tasks, its performance is comparable to Qwen3 Max, offering higher efficiency and lower cost. In multimodal capabilities, it delivers significant improvements over the Qwen3 VL series.",
"qwen3.5-flash.description": "Fastest and lowest-cost Qwen model, ideal for simple tasks.",
"qwen3.5-plus.description": "Qwen3.5 Plus supports text, image, and video input. Its performance on pure text tasks is comparable to Qwen3 Max, with better performance and lower cost. Its multimodal capabilities are significantly improved compared to the Qwen3 VL series.",
"qwen3.5:397b.description": "Qwen3.5 is a unified visionlanguage foundation model with a hybrid architecture (Mixture-of-Experts + linear attention), offering strong multimodal reasoning, coding, and long-context capabilities with a 256K context window.",
"qwen3.description": "Qwen3 is Alibabas next-generation large language model with strong performance across diverse use cases.",
"qwq-32b-preview.description": "QwQ is an experimental research model from Qwen focused on improved reasoning.",
"qwq-32b.description": "QwQ is a reasoning model in the Qwen family. Compared with standard instruction-tuned models, it brings thinking and reasoning that significantly boost downstream performance, especially on complex problems. QwQ-32B is a mid-sized reasoning model that rivals top reasoning models like DeepSeek-R1 and o1-mini.",
"qwq-plus.description": "QwQ reasoning model trained on Qwen2.5 uses RL to greatly improve reasoning. Core metrics in math/code (AIME 24/25, LiveCodeBench) and some general benchmarks (IFEval, LiveBench) reach the full DeepSeek-R1 level.",
"qwq.description": "QwQ is a reasoning model in the Qwen family. Compared with standard instruction-tuned models, it brings thinking and reasoning abilities that significantly improve downstream performance, especially on hard problems. QwQ-32B is a mid-sized reasoning model that competes well with top reasoning models like DeepSeek-R1 and o1-mini.",
"qwq_32b.description": "Mid-sized reasoning model in the Qwen family. Compared with standard instruction-tuned models, QwQs thinking and reasoning abilities significantly boost downstream performance, especially on hard problems.",
"r1-1776.description": "R1-1776 is a post-trained variant of DeepSeek R1 designed to provide uncensored, unbiased factual information.",
"seedance-1-5-pro-251215.description": "Seedance 1.5 Pro by ByteDance supports text-to-video, image-to-video (first frame, first+last frame), and audio generation synchronized with visuals.",
"seedream-5-0-260128.description": "ByteDance-Seedream-5.0-lite by BytePlus features web-retrieval-augmented generation for real-time information, enhanced complex prompt interpretation, and improved reference consistency for professional visual creation.",
"solar-mini-ja.description": "Solar Mini (Ja) extends Solar Mini with a focus on Japanese while maintaining efficient, strong performance in English and Korean.",
"solar-mini.description": "Solar Mini is a compact LLM that outperforms GPT-3.5, with strong multilingual capability supporting English and Korean, offering an efficient small-footprint solution.",
"solar-pro.description": "Solar Pro is a high-intelligence LLM from Upstage, focused on instruction following on a single GPU, with IFEval scores above 80. It currently supports English; the full release was planned for November 2024 with expanded language support and longer context.",
"sonar-deep-research.description": "Deep Research performs comprehensive expert-level research and synthesizes it into accessible, actionable reports.",
"sonar-pro.description": "An advanced search product with search grounding for complex queries and follow-ups.",
"sonar-reasoning-pro.description": "An advanced search product with search grounding for complex queries and follow-ups.",
"sonar-reasoning.description": "An advanced search product with search grounding for complex queries and follow-ups.",
"sonar.description": "A lightweight search-grounded product, faster and cheaper than Sonar Pro.",
"sophnet/deepseek-v3.2.description": "DeepSeek V3.2 is a model that strikes a balance between high computational efficiency and excellent reasoning and agent performance.",
"spark-x.description": "X2 Capabilities Overview: 1. Introduces dynamic adjustment of reasoning mode, controlled via the `thinking` field. 2. Expanded context length: 64K input tokens and 128K output tokens. 3. Supports Function Call functionality.",
"stable-diffusion-3-medium.description": "The latest text-to-image model from Stability AI. This version significantly improves image quality, text understanding, and style diversity, interpreting complex natural-language prompts more accurately and generating more precise, diverse images.",
"stable-diffusion-3.5-large-turbo.description": "stable-diffusion-3.5-large-turbo applies adversarial diffusion distillation (ADD) to stable-diffusion-3.5-large for faster speed.",
"stable-diffusion-3.5-large.description": "stable-diffusion-3.5-large is an 800M-parameter MMDiT text-to-image model with excellent quality and prompt alignment, supporting 1-megapixel images and efficient runs on consumer hardware.",
"stable-diffusion-v1.5.description": "stable-diffusion-v1.5 is initialized from the v1.2 checkpoint and fine-tuned for 595k steps on \"laion-aesthetics v2 5+\" at 512x512 resolution, reducing text conditioning by 10% to improve classifier-free guidance sampling.",
"stable-diffusion-xl-base-1.0.description": "An open-source text-to-image model from Stability AI with industry-leading creative image generation. It has strong instruction understanding and supports reverse prompt definitions for precise generation.",
"stable-diffusion-xl.description": "stable-diffusion-xl brings major improvements over v1.5 and matches top open text-to-image results. Improvements include a 3x larger UNet backbone, a refinement module for better image quality, and more efficient training techniques.",
"step-1-128k.description": "Balances performance and cost for general scenarios.",
"step-1-256k.description": "Extra-long context handling, ideal for long-document analysis.",
"step-1-32k.description": "Supports mid-length conversations for a wide range of scenarios.",
"step-1-8k.description": "Small model suited for lightweight tasks.",
"step-1-flash.description": "High-speed model suitable for real-time chat.",
"step-1.5v-mini.description": "Strong video understanding capabilities.",
"step-1o-turbo-vision.description": "Strong image understanding, outperforming 1o in math and coding. Smaller than 1o with faster output.",
"step-1o-vision-32k.description": "Strong image understanding with better visual performance than the Step-1V series.",
"step-1v-32k.description": "Supports vision inputs for richer multimodal interaction.",
"step-1v-8k.description": "Small vision model for basic image-and-text tasks.",
"step-1x-edit.description": "This model focuses on image editing, modifying and enhancing images based on user-provided images and text. It supports multiple input formats, including text descriptions and example images, and generates edits aligned with user intent.",
"step-1x-medium.description": "This model offers strong image generation with text prompt input. With native Chinese support, it better understands Chinese descriptions, captures their semantics, and converts them into visual features for more accurate generation. It produces high-resolution, high-quality images and supports a degree of style transfer.",
"step-2-16k-exp.description": "Experimental Step-2 build with the latest features and rolling updates. Not recommended for production.",
"step-2-16k.description": "Supports large-context interactions for complex dialogues.",
"step-2-mini.description": "Built on the next-generation in-house MFA attention architecture, delivering Step-1-like results at much lower cost while achieving higher throughput and faster latency. Handles general tasks with strong coding ability.",
"step-2x-large.description": "A new-generation StepFun image model focused on image generation, producing high-quality images from text prompts. It delivers more realistic texture and stronger Chinese/English text rendering.",
"step-3.5-flash.description": "Stepfuns flagship language reasoning model.This model has top-notch reasoning capabilities and fast and reliable execution capabilities.Able to decompose and plan complex tasks, call tools quickly and reliably to perform tasks, and be competent in various complex tasks such as logical reasoning, mathematics, software engineering, and in-depth research.",
"step-3.description": "This model has strong visual perception and complex reasoning, accurately handling cross-domain knowledge understanding, math-vision cross analysis, and a wide range of everyday visual analysis tasks.",
"step-r1-v-mini.description": "A reasoning model with strong image understanding that can process images and text, then generate text after deep reasoning. It excels at visual reasoning and delivers top-tier math, coding, and text reasoning, with a 100K context window.",
"stepfun-ai/step3.description": "Step3 is a cutting-edge multimodal reasoning model from StepFun, built on an MoE architecture with 321B total and 38B active parameters. Its end-to-end design minimizes decoding cost while delivering top-tier vision-language reasoning. With MFA and AFD design, it stays efficient on both flagship and low-end accelerators. Pretraining uses 20T+ text tokens and 4T image-text tokens across many languages. It reaches leading open-model performance on math, code, and multimodal benchmarks.",
"taichu4_vl_2b_nothinking.description": "The No-Thinking version of the Taichu4.0-VL 2B model features lower memory usage, a lightweight design, fast response speed, and strong multimodal understanding capabilities.",
"taichu4_vl_32b.description": "The Thinking version of the Taichu4.0-VL 32B model is suited for complex multimodal understanding and reasoning tasks, demonstrating outstanding performance in multimodal mathematical reasoning, multimodal agent capabilities, and general image and visual comprehension.",
"taichu4_vl_32b_nothinking.description": "The No-Thinking version of the Taichu4.0-VL 32B model is designed for complex image-and-text understanding and visual knowledge QA scenarios, excelling in image captioning, visual question answering, video comprehension, and visual localization tasks.",
"taichu4_vl_3b.description": "The Thinking version of the Taichu4.0-VL 3B model efficiently performs multimodal understanding and reasoning tasks, with comprehensive upgrades in visual comprehension, visual localization, OCR recognition, and related capabilities.",
"taichu_llm.description": "The Zidong Taichu large language model is a high-performance text-generation model developed using fully domestic full-stack technologies. Through structured compression of a hundred-billion-parameter base model and task-specific optimization, it significantly enhances complex text comprehension and knowledge reasoning capabilities. It excels in scenarios such as long-document analysis, cross-lingual information extraction, and knowledge-constrained generation.",
"taichu_llm_14b.description": "The Zidong Taichu large language model is a high-performance text-generation model developed using fully domestic full-stack technologies. Through structured compression of a hundred-billion-parameter base model and task-specific optimization, it significantly enhances complex text comprehension and knowledge reasoning capabilities. It excels in scenarios such as long-document analysis, cross-lingual information extraction, and knowledge-constrained generation.",
"taichu_llm_2b.description": "The Zidong Taichu large language model is a high-performance text-generation model developed using fully domestic full-stack technologies. Through structured compression of a hundred-billion-parameter base model and task-specific optimization, it significantly enhances complex text comprehension and knowledge reasoning capabilities. It excels in scenarios such as long-document analysis, cross-lingual information extraction, and knowledge-constrained generation.",
"taichu_o1.description": "taichu_o1 is a next-generation reasoning large model that achieves human-like chain-of-thought through multimodal interaction and reinforcement learning. It supports complex decision-making simulations and, while maintaining high-precision output, reveals interpretable reasoning pathways. It is well-suited for strategy analysis, deep thinking, and similar scenarios.",
"tencent/Hunyuan-A13B-Instruct.description": "Hunyuan-A13B-Instruct uses 80B total parameters with 13B active to match larger models. It supports fast/slow hybrid reasoning, stable long-text understanding, and leading agent ability on BFCL-v3 and τ-Bench. GQA and multi-quant formats enable efficient inference.",
"tencent/Hunyuan-MT-7B.description": "Hunyuan Translation Model includes Hunyuan-MT-7B and the ensemble Hunyuan-MT-Chimera. Hunyuan-MT-7B is a 7B lightweight translation model supporting 33 languages plus 5 Chinese minority languages. In WMT25 it took 30 first-place results across 31 language pairs. Tencent Hunyuan uses a full training pipeline from pretraining to SFT to translation RL and ensemble RL, achieving leading performance at its size with efficient, easy deployment.",
"text-embedding-3-large.description": "The most capable embedding model for English and non-English tasks.",
"text-embedding-3-small-inference.description": "Embedding V3 small (Inference) model for text embeddings.",
"text-embedding-3-small.description": "An efficient, cost-effective next-generation embedding model for retrieval and RAG scenarios.",
"text-embedding-ada-002.description": "Embedding V2 Ada model for text embeddings.",
"thudm/glm-4-32b.description": "GLM-4-32B-0414 is a 32B bilingual (Chinese/English) open-weights model optimized for code generation, function calling, and agent tasks. It is pretrained on 15T high-quality and reasoning-heavy data and further refined with human preference alignment, rejection sampling, and RL. It excels at complex reasoning, artifact generation, and structured output, reaching GPT-4o and DeepSeek-V3-0324-level performance on multiple benchmarks.",
"thudm/glm-4-32b:free.description": "GLM-4-32B-0414 is a 32B bilingual (Chinese/English) open-weights model optimized for code generation, function calling, and agent tasks. It is pretrained on 15T high-quality and reasoning-heavy data and further refined with human preference alignment, rejection sampling, and RL. It excels at complex reasoning, artifact generation, and structured output, reaching GPT-4o and DeepSeek-V3-0324-level performance on multiple benchmarks.",
"thudm/glm-4-9b-chat.description": "The open-source release of Zhipu AIs latest GLM-4 pretraining model.",
"thudm/glm-z1-32b.description": "GLM-Z1-32B-0414 is an enhanced reasoning variant of GLM-4-32B, built for deep math, logic, and code-focused problem solving. It applies expanded RL (task-specific and general pairwise preference) to improve complex multi-step tasks. Compared to GLM-4-32B, Z1 significantly improves structured reasoning and formal-domain capability.\n\nIt supports enforcing “thinking” steps via prompt engineering, improved coherence for long outputs, and is optimized for agent workflows with long context (via YaRN), JSON tool calling, and fine-grained sampling for stable reasoning. Ideal for use cases requiring careful multi-step or formal derivations.",
"thudm/glm-z1-rumination-32b.description": "GLM Z1 Rumination 32B is a 32B deep reasoning model in the GLM-4-Z1 series, optimized for complex open-ended tasks that require long thinking. Built on glm-4-32b-0414, it adds extra RL stages and multi-stage alignment, introducing a “rumination” capability that simulates extended cognitive processing. This includes iterative reasoning, multi-hop analysis, and tool-augmented workflows such as search, retrieval, and citation-aware synthesis.\n\nIt excels at research writing, comparative analysis, and complex QA. It supports function calling for search/navigation primitives (`search`, `click`, `open`, `finish`) for agent pipelines. Rumination behavior is controlled by multi-round loops with rule-based reward shaping and delayed decision mechanisms, benchmarked against deep research frameworks like OpenAIs internal alignment stack. This variant is for depth over speed.",
"tngtech/deepseek-r1t-chimera:free.description": "DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining R1 reasoning with V3 token efficiency. It is based on the DeepSeek-MoE Transformer and optimized for general text generation.\n\nIt merges pretrained weights to balance reasoning, efficiency, and instruction following. Released under the MIT license for research and commercial use.",
"togethercomputer/StripedHyena-Nous-7B.description": "StripedHyena Nous (7B) delivers enhanced compute efficiency through its architecture and strategy.",
"tts-1-hd.description": "The latest text-to-speech model optimized for quality.",
"tts-1.description": "The latest text-to-speech model optimized for real-time speed.",
"upstage/SOLAR-10.7B-Instruct-v1.0.description": "Upstage SOLAR Instruct v1 (11B) is tuned for precise instruction tasks with strong language performance.",
"us.anthropic.claude-3-5-sonnet-20241022-v2:0.description": "Claude 3.5 Sonnet raises the industry standard, outperforming competitors and Claude 3 Opus across broad evaluations while keeping mid-tier speed and cost.",
"us.anthropic.claude-3-7-sonnet-20250219-v1:0.description": "Claude 3.7 Sonnet is Anthropic's fastest next-gen model. Compared to Claude 3 Haiku, it improves across skills and surpasses the previous flagship Claude 3 Opus on many intelligence benchmarks.",
"v0-1.0-md.description": "v0-1.0-md is a legacy model served via the v0 API.",
"v0-1.5-lg.description": "v0-1.5-lg is suited for advanced thinking or reasoning tasks.",
"v0-1.5-md.description": "v0-1.5-md is suited for everyday tasks and UI generation.",
"vercel/v0-1.0-md.description": "Access the models behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.",
"vercel/v0-1.5-md.description": "Access the models behind v0 to generate, fix, and optimize modern web apps with framework-specific reasoning and up-to-date knowledge.",
"volcengine/doubao-seed-2-0-code.description": "Doubao-Seed-2.0-Code is optimized for enterprise-level programming needs. Built on the excellent Agent and VLM capabilities of Seed 2.0, it specially enhances coding abilities with outstanding frontend performance and targeted optimization for common enterprise multi-language coding requirements, making it ideal for integration with various AI programming tools.",
"volcengine/doubao-seed-2-0-lite.description": "Balances generation quality and response speed, suitable as a general-purpose production-grade model",
"volcengine/doubao-seed-2-0-mini.description": "Points to the latest version of doubao-seed-2-0-mini",
"volcengine/doubao-seed-2-0-pro.description": "Points to the latest version of doubao-seed-2-0-pro",
"volcengine/doubao-seed-code.description": "Doubao-Seed-Code is ByteDance Volcano Engines LLM optimized for agentic programming, performing strongly on programming and agent benchmarks with 256K context support.",
"wan2.2-t2i-flash.description": "Wanxiang 2.2 Flash is the latest model with upgrades in creativity, stability, and realism, delivering fast generation and high value.",
"wan2.2-t2i-plus.description": "Wanxiang 2.2 Plus is the latest model with upgrades in creativity, stability, and realism, producing richer details.",
"wan2.5-i2i-preview.description": "Wanxiang 2.5 I2I Preview supports single-image editing and multi-image fusion.",
"wan2.5-t2i-preview.description": "Wanxiang 2.5 T2I supports flexible selection of image dimensions within total pixel area and aspect ratio constraints.",
"wan2.6-image.description": "Wanxiang 2.6 Image supports image editing and mixed imagetext layout output.",
"wan2.6-t2i.description": "Wanxiang 2.6 T2I supports flexible selection of image dimensions within total pixel area and aspect ratio constraints (same as Wanxiang 2.5).",
"wanx-v1.description": "Base text-to-image model. Corresponds to Tongyi Wanxiang 1.0 General.",
"wanx2.0-t2i-turbo.description": "Excels at textured portraits with moderate speed and lower cost. Corresponds to Tongyi Wanxiang 2.0 Speed.",
"wanx2.1-t2i-plus.description": "Fully upgraded version with richer image details and slightly slower speed. Corresponds to Tongyi Wanxiang 2.1 Pro.",
"wanx2.1-t2i-turbo.description": "Fully upgraded version with fast generation, strong overall quality, and high value. Corresponds to Tongyi Wanxiang 2.1 Speed.",
"whisper-1.description": "A general speech recognition model supporting multilingual ASR, speech translation, and language identification.",
"wizardlm2.description": "WizardLM 2 is a language model from Microsoft AI that excels at complex dialogue, multilingual tasks, reasoning, and assistants.",
"wizardlm2:8x22b.description": "WizardLM 2 is a language model from Microsoft AI that excels at complex dialogue, multilingual tasks, reasoning, and assistants.",
"x-ai/grok-4-fast-non-reasoning.description": "Grok 4 Fast (Non-Reasoning) is xAIs high-throughput, low-cost multimodal model (supports a 2M context window) for scenarios sensitive to latency and cost that do not require in-model reasoning. It sits alongside the reasoning version of Grok 4 Fast, and reasoning can be enabled via the API reasoning parameter when needed. Prompts and completions may be used by xAI or OpenRouter to improve future models.",
"x-ai/grok-4-fast.description": "Grok 4 Fast is xAIs high-throughput, low-cost model (supports a 2M context window), ideal for high-concurrency and long-context use cases.",
"x-ai/grok-4.1-fast-non-reasoning.description": "Grok 4 Fast (Non-Reasoning) is xAIs high-throughput, low-cost multimodal model (supports a 2M context window) for scenarios sensitive to latency and cost that do not require in-model reasoning. It sits alongside the reasoning version of Grok 4 Fast, and reasoning can be enabled via the API reasoning parameter when needed. Prompts and completions may be used by xAI or OpenRouter to improve future models.",
"x-ai/grok-4.1-fast.description": "Grok 4 Fast is xAIs high-throughput, low-cost model (supports a 2M context window), ideal for high-concurrency and long-context use cases.",
"x-ai/grok-4.description": "Grok 4 is xAI's flagship reasoning model with strong reasoning and multimodal capability.",
"x-ai/grok-code-fast-1.description": "Grok Code Fast 1 is xAI's fast code model with readable, engineering-friendly output.",
"x1.description": "X1.5 updates: (1) adds dynamic thinking mode controlled by the `thinking` field; (2) larger context length with 64K input and 64K output; (3) supports FunctionCall.",
"xai/grok-2-vision.description": "Grok 2 Vision excels at visual tasks, delivering SOTA performance on visual math reasoning (MathVista) and document QA (DocVQA). It handles documents, charts, graphs, screenshots, and photos.",
"xai/grok-2.description": "Grok 2 is a frontier model with state-of-the-art reasoning, strong chat, coding, and reasoning performance, and ranks above Claude 3.5 Sonnet and GPT-4 Turbo on LMSYS.",
"xai/grok-3-fast.description": "xAIs flagship model excels in enterprise use cases like data extraction, coding, and summarization, with deep domain knowledge in finance, healthcare, law, and science. The fast variant runs on quicker infrastructure for much faster responses at higher per-token cost.",
"xai/grok-3-mini-fast.description": "xAIs lightweight model that thinks before responding, ideal for simple or logic-based tasks without deep domain knowledge. Raw reasoning traces are available. The fast variant runs on quicker infrastructure for much faster responses at higher per-token cost.",
"xai/grok-3-mini.description": "xAIs lightweight model that thinks before responding, ideal for simple or logic-based tasks without deep domain knowledge. Raw reasoning traces are available.",
"xai/grok-3.description": "xAIs flagship model excels in enterprise use cases like data extraction, coding, and summarization, with deep domain knowledge in finance, healthcare, law, and science.",
"xai/grok-4.description": "xAIs newest flagship model with unparalleled performance in natural language, math, and reasoning—an ideal all-rounder.",
"yi-large-fc.description": "Built on yi-large with enhanced tool-calling, suited for agent and workflow scenarios.",
"yi-large-preview.description": "An early version; yi-large (newer) is recommended.",
"yi-large-rag.description": "An advanced service based on yi-large, combining retrieval and generation for precise answers with real-time web search.",
"yi-large-turbo.description": "Exceptional value and performance, tuned for a strong balance of quality, speed, and cost.",
"yi-large.description": "A new 100B-parameter model with strong Q&A and text generation.",
"yi-lightning-lite.description": "A lightweight version; yi-lightning is recommended.",
"yi-lightning.description": "A latest high-performance model with faster inference and high-quality output.",
"yi-medium-200k.description": "A 200K long-context model for deep long-form understanding and generation.",
"yi-medium.description": "A tuned mid-size model with balanced capability and value, optimized for instruction following.",
"yi-spark.description": "A compact, fast model with strengthened math and coding capabilities.",
"yi-vision-v2.description": "A vision model for complex tasks with strong multi-image understanding and analysis.",
"yi-vision.description": "A vision model for complex tasks with strong image understanding and analysis.",
"z-ai/glm-4.5-air.description": "GLM 4.5 Air is a lightweight GLM 4.5 variant for cost-sensitive scenarios while retaining strong reasoning.",
"z-ai/glm-4.5.description": "GLM 4.5 is Z.AIs flagship model with hybrid reasoning optimized for engineering and long-context tasks.",
"z-ai/glm-4.6.description": "GLM 4.6 is Z.AI's flagship model with extended context length and coding capability.",
"z-ai/glm-4.7.description": "GLM-4.7 is Zhipu's latest flagship model, offering improved general capabilities, simpler and more natural replies, and a more immersive writing experience.",
"z-ai/glm4.7.description": "GLM-4.7 is Zhipu latest flagship model, enhanced for Agentic Coding scenarios with improved coding capabilities.",
"z-ai/glm5.description": "GLM-5 is Zhipu AI's new flagship foundation model for agent engineering, achieving open-source SOTA performance in coding and agent capabilities. It matches Claude Opus 4.5 in performance.",
"z-image-turbo.description": "Z-Image is a lightweight text-to-image generation model that can rapidly produce images, supports both Chinese and English text rendering, and flexibly adapts to multiple resolutions and aspect ratios.",
"zai-glm-4.7.description": "This model delivers strong coding performance with advanced reasoning capabilities, superior tool use, and enhanced real-world performance in agentic coding applications.",
"zai-org/GLM-4.5-Air.description": "GLM-4.5-Air is a base model for agent applications using a Mixture-of-Experts architecture. It is optimized for tool use, web browsing, software engineering, and frontend coding, and integrates with code agents like Claude Code and Roo Code. It uses hybrid reasoning to handle both complex reasoning and everyday scenarios.",
"zai-org/GLM-4.5V.description": "GLM-4.5V is Zhipu AIs latest VLM, built on the GLM-4.5-Air flagship text model (106B total, 12B active) with an MoE architecture for strong performance at lower cost. It follows the GLM-4.1V-Thinking path and adds 3D-RoPE to improve 3D spatial reasoning. Optimized through pretraining, SFT, and RL, it handles images, video, and long documents and ranks top among open models on 41 public multimodal benchmarks. A Thinking mode toggle lets users balance speed and depth.",
"zai-org/GLM-4.6.description": "Compared to GLM-4.5, GLM-4.6 expands context from 128K to 200K for more complex agent tasks. It scores higher on code benchmarks and shows stronger real-world performance in apps like Claude Code, Cline, Roo Code, and Kilo Code, including better frontend page generation. Reasoning is improved and tool use is supported during reasoning, strengthening overall capability. It integrates better into agent frameworks, improves tool/search agents, and has more human-preferred writing style and roleplay naturalness.",
"zai-org/GLM-4.6V.description": "GLM-4.6V achieves SOTA visual understanding accuracy for its parameter scale and is the first to natively integrate Function Call capabilities into the vision model architecture, bridging the gap from \"visual perception\" to \"executable actions\" and providing a unified technical foundation for multimodal agents in real business scenarios. The visual context window is extended to 128k, supporting long video stream processing and high-resolution multi-image analysis.",
"zai/glm-4.5-air.description": "GLM-4.5 and GLM-4.5-Air are our latest flagships for agent applications, both using MoE. GLM-4.5 has 355B total and 32B active per forward pass; GLM-4.5-Air is slimmer with 106B total and 12B active.",
"zai/glm-4.5.description": "The GLM-4.5 series is designed for agents. The flagship GLM-4.5 combines reasoning, coding, and agent skills with 355B total params (32B active) and offers dual operation modes as a hybrid reasoning system.",
"zai/glm-4.5v.description": "GLM-4.5V builds on GLM-4.5-Air, inheriting proven GLM-4.1V-Thinking techniques and scaling with a strong 106B-parameter MoE architecture.",
"zenmux/auto.description": "ZenMux auto-routing selects the best-value, best-performing model from supported options based on your request."
}