diff --git a/plugin-dev-en/0411-persistent-storage-kv.mdx b/plugin-dev-en/0411-persistent-storage-kv.mdx
index cf09907e..8a79d7e0 100644
--- a/plugin-dev-en/0411-persistent-storage-kv.mdx
+++ b/plugin-dev-en/0411-persistent-storage-kv.mdx
@@ -7,63 +7,148 @@ dimensions:
standard_title: Persistent Storage KV
language: en
title: Persistent Storage
-description: This document introduces the persistent storage functionality in Dify
- plugins, detailing how to use the KV database in plugins to store, retrieve, and
- delete data. This feature enables plugins to persistently store data within the
- same Workspace, meeting the needs for data preservation across sessions.
+description: Learn how to implement persistent storage in your Dify plugins using the built-in key-value database to maintain state across interactions.
---
-When examining Tools and Endpoints in plugins individually, it's not difficult to see that in most cases, they can only complete single-round interactions: request, return data, and the task ends.
+## Overview
-If there is data that needs to be stored long-term, such as implementing persistent memory, the plugin needs to have persistent storage capabilities. **The persistent storage mechanism allows plugins to have the ability to persistently store data within the same Workspace**. Currently, a KV database is provided to meet storage needs, and more flexible and powerful storage interfaces may be introduced in the future based on actual usage.
+Most plugin tools and endpoints operate in a stateless, single-round interaction model:
+1. Receive a request
+2. Process data
+3. Return a response
+4. End the interaction
-### Storing Keys
+However, many real-world applications require maintaining state across multiple interactions. This is where **persistent storage** becomes essential.
-#### **Entry Point**
+
+The persistent storage mechanism allows plugins to store data persistently within the same workspace, enabling stateful applications and memory features.
+
+
+Dify currently provides a key-value (KV) storage system for plugins, with plans to introduce more flexible and powerful storage interfaces in the future based on developer needs.
+
+## Accessing Storage
+
+All storage operations are performed through the `storage` object available in your plugin's session:
```python
- self.session.storage
+# Access the storage interface
+storage = self.session.storage
```
-#### **Interface**
+## Storage Operations
+
+### Storing Data
+
+Store data with the `set` method:
```python
- def set(self, key: str, val: bytes) -> None:
- pass
+def set(self, key: str, val: bytes) -> None:
+ """
+ Store data in persistent storage
+
+ Parameters:
+ key: Unique identifier for your data
+ val: Binary data to store (bytes)
+ """
+ pass
```
-Note that what is passed in is bytes, so you can actually store files in it.
+
+The value must be in `bytes` format. This provides flexibility to store various types of data, including files.
+
-### Getting Keys
-
-#### **Entry Point**
+#### Example: Storing Different Data Types
```python
- self.session.storage
+# String data (must convert to bytes)
+storage.set("user_name", "John Doe".encode('utf-8'))
+
+# JSON data
+import json
+user_data = {"name": "John", "age": 30, "preferences": ["AI", "NLP"]}
+storage.set("user_data", json.dumps(user_data).encode('utf-8'))
+
+# File data
+with open("image.jpg", "rb") as f:
+ image_data = f.read()
+ storage.set("profile_image", image_data)
```
-#### **Interface**
+### Retrieving Data
+
+Retrieve stored data with the `get` method:
```python
- def get(self, key: str) -> bytes:
- pass
+def get(self, key: str) -> bytes:
+ """
+ Retrieve data from persistent storage
+
+ Parameters:
+ key: Unique identifier for your data
+
+ Returns:
+ The stored data as bytes, or None if key doesn't exist
+ """
+ pass
```
-### Deleting Keys
-
-#### **Entry Point**
+#### Example: Retrieving and Converting Data
```python
- self.session.storage
+# Retrieving string data
+name_bytes = storage.get("user_name")
+if name_bytes:
+ name = name_bytes.decode('utf-8')
+ print(f"Retrieved name: {name}")
+
+# Retrieving JSON data
+import json
+user_data_bytes = storage.get("user_data")
+if user_data_bytes:
+ user_data = json.loads(user_data_bytes.decode('utf-8'))
+ print(f"User preferences: {user_data['preferences']}")
```
-#### **Interface**
+### Deleting Data
+
+Delete stored data with the `delete` method:
```python
- def delete(self, key: str) -> None:
- pass
+def delete(self, key: str) -> None:
+ """
+ Delete data from persistent storage
+
+ Parameters:
+ key: Unique identifier for the data to delete
+ """
+ pass
```
+## Best Practices
+
+
+
+ Create a consistent naming scheme for your keys to avoid conflicts and make your code more maintainable.
+
+
+ Always check if data exists before processing it, as the key might not be found.
+
+
+ Convert complex objects to JSON or other serialized formats before storing.
+
+
+ Wrap storage operations in try/except blocks to handle potential errors gracefully.
+
+
+
+## Common Use Cases
+
+- **User Preferences**: Store user settings and preferences between sessions
+- **Conversation History**: Maintain context from previous conversations
+- **API Tokens**: Store authentication tokens securely
+- **Cached Data**: Store frequently accessed data to reduce API calls
+- **File Storage**: Store user-uploaded files or generated content
+
{/*
Contributing Section
DO NOT edit this section!
diff --git a/plugin-dev-en/0412-model-schema.mdx b/plugin-dev-en/0412-model-schema.mdx
index 71a04720..a8aa980e 100644
--- a/plugin-dev-en/0412-model-schema.mdx
+++ b/plugin-dev-en/0412-model-schema.mdx
@@ -7,386 +7,947 @@ dimensions:
standard_title: Model Schema
language: en
title: Model API Interface
-description: This document provides detailed interface specifications required for
- Dify model plugin development, including model provider implementation, interface
- definitions for five model types (LLM, TextEmbedding, Rerank, Speech2text, Text2speech),
- and complete specifications for related data structures such as PromptMessage and
- LLMResult. The document serves as a development reference for developers implementing
- various model integrations.
+description: Comprehensive guide to the Dify model plugin API including implementation requirements for LLM, TextEmbedding, Rerank, Speech2text, and Text2speech models, with detailed specifications for all related data structures.
---
-This section introduces the interface methods and parameter descriptions that providers and each model type need to implement. Before developing a model plugin, you may first need to read [Model Design Rules](/plugin-dev-en/0411-model-designing-rules) and [Model Plugin Introduction](/plugin-dev-en/0131-model-plugin-introduction).
+## Introduction
-### Model Provider
+This document details the interfaces and data structures required to implement Dify model plugins. It serves as a technical reference for developers integrating AI models with the Dify platform.
-Inherit the `__base.model_provider.ModelProvider` base class and implement the following interface:
+
+Before diving into this API reference, we recommend first reading the [Model Design Rules](/plugin-dev-en/0411-model-designing-rules) and [Model Plugin Introduction](/plugin-dev-en/0131-model-plugin-introduction) for conceptual understanding.
+
-```python
+
+
+ Learn how to implement model provider classes for different AI service providers
+
+
+ Implementation details for the five supported model types: LLM, Embedding, Rerank, Speech2Text, and Text2Speech
+
+
+ Comprehensive reference for all data structures used in the model API
+
+
+ Guidelines for proper error mapping and exception handling
+
+
+
+## Model Provider
+
+Every model provider must inherit from the `__base.model_provider.ModelProvider` base class and implement the credential validation interface.
+
+### Provider Credential Validation
+
+
+```python Core Implementation
def validate_provider_credentials(self, credentials: dict) -> None:
"""
- Validate provider credentials
- You can choose any validate_credentials method of model type or implement validate method by yourself,
- such as: get model list api
-
- if validate failed, raise exception
-
- :param credentials: provider credentials, credentials form defined in `provider_credential_schema`.
+ Validate provider credentials by making a test API call
+
+ Parameters:
+ credentials: Provider credentials as defined in `provider_credential_schema`
+
+ Raises:
+ CredentialsValidateFailedError: If validation fails
"""
+ try:
+ # Example implementation - validate using an LLM model instance
+ model_instance = self.get_model_instance(ModelType.LLM)
+ model_instance.validate_credentials(
+ model="example-model",
+ credentials=credentials
+ )
+ except Exception as ex:
+ logger.exception(f"Credential validation failed")
+ raise CredentialsValidateFailedError(f"Invalid credentials: {str(ex)}")
```
-* `credentials` (object) Credential information
-
-The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema`, passed in as `api_key`, etc. If validation fails, please throw a `errors.validate.CredentialsValidateFailedError` error. **Note: Predefined models need to fully implement this interface, while custom model providers only need to implement it simply as follows:**
-
-```python
+```python Custom Model Provider
class XinferenceProvider(Provider):
def validate_provider_credentials(self, credentials: dict) -> None:
+ """
+ For custom-only model providers, a simple implementation is sufficient
+ as validation happens at the model level
+ """
pass
```
+
-### Models
+
+ Credential information as defined in the provider's YAML configuration under `provider_credential_schema`.
+ Typically includes fields like `api_key`, `organization_id`, etc.
+
-Models are divided into 5 different types, with different base classes to inherit from and different methods to implement for each type.
+
+If validation fails, your implementation must raise a `CredentialsValidateFailedError` exception. This ensures proper error handling in the Dify UI.
+
-#### Common Interfaces
+
+For predefined model providers, you should implement a thorough validation method that verifies the credentials work with your API. For custom model providers (where each model has its own credentials), a simplified implementation is sufficient.
+
-All models need to implement the following 2 methods consistently:
+## Models
-* Model credential validation
+Dify supports five distinct model types, each requiring implementation of specific interfaces. However, all model types share some common requirements.
-Similar to provider credential validation, this validates individual models.
+### Common Interfaces
-```python
+Every model implementation, regardless of type, must implement these two fundamental methods:
+
+#### 1. Model Credential Validation
+
+
+```python Implementation
def validate_credentials(self, model: str, credentials: dict) -> None:
"""
- Validate model credentials
-
- :param model: model name
- :param credentials: model credentials
- :return:
+ Validate that the provided credentials work with the specified model
+
+ Parameters:
+ model: The specific model identifier (e.g., "gpt-4")
+ credentials: Authentication details for the model
+
+ Raises:
+ CredentialsValidateFailedError: If validation fails
"""
+ try:
+ # Make a lightweight API call to verify credentials
+ # Example: List available models or check account status
+ response = self._api_client.validate_api_key(credentials["api_key"])
+
+ # Verify the specific model is available if applicable
+ if model not in response.get("available_models", []):
+ raise CredentialsValidateFailedError(f"Model {model} is not available")
+
+ except ApiException as e:
+ raise CredentialsValidateFailedError(str(e))
```
+
-Parameters:
+
+ The specific model identifier to validate (e.g., "gpt-4", "claude-3-opus")
+
-* `model` (string) Model name
-* `credentials` (object) Credential information
+
+ Credential information as defined in the provider's configuration
+
-The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc. If validation fails, please throw a `errors.validate.CredentialsValidateFailedError` error.
+#### 2. Error Mapping
-* Invocation error mapping table
-
-When a model invocation exception occurs, it needs to be mapped to a specified `InvokeError` type in Runtime, which helps Dify handle different errors differently. Runtime Errors:
-
-* `InvokeConnectionError` Connection error during invocation
-* `InvokeServerUnavailableError` Service provider unavailable
-* `InvokeRateLimitError` Rate limit reached
-* `InvokeAuthorizationError` Authentication failed
-* `InvokeBadRequestError` Incorrect parameters passed
-
-```python
+
+```python Implementation
@property
def _invoke_error_mapping(self) -> dict[type[InvokeError], list[type[Exception]]]:
"""
- Map model invoke error to unified error
- The key is the error type thrown to the caller
- The value is the error type thrown by the model,
- which needs to be converted into a unified error type for the caller.
-
- :return: Invoke error mapping
+ Map provider-specific exceptions to standardized Dify error types
+
+ Returns:
+ Dictionary mapping Dify error types to lists of provider exception types
"""
+ return {
+ InvokeConnectionError: [
+ requests.exceptions.ConnectionError,
+ requests.exceptions.Timeout,
+ ConnectionRefusedError
+ ],
+ InvokeServerUnavailableError: [
+ ServiceUnavailableError,
+ HTTPStatusError
+ ],
+ InvokeRateLimitError: [
+ RateLimitExceededError,
+ QuotaExceededError
+ ],
+ InvokeAuthorizationError: [
+ AuthenticationError,
+ InvalidAPIKeyError,
+ PermissionDeniedError
+ ],
+ InvokeBadRequestError: [
+ InvalidRequestError,
+ ValidationError
+ ]
+ }
+```
+
+
+
+
+ Network connection failures, timeouts
+
+
+ Service provider is down or unavailable
+
+
+ Rate limits or quota limits reached
+
+
+ Authentication or permission issues
+
+
+ Invalid parameters or requests
+
+
+
+
+You can alternatively raise these standardized error types directly in your code instead of relying on the error mapping. This approach gives you more control over error messages.
+
+
+### LLM Implementation
+
+To implement a Large Language Model provider, inherit from the `__base.large_language_model.LargeLanguageModel` base class and implement these methods:
+
+#### 1. Model Invocation
+
+This core method handles both streaming and non-streaming API calls to language models.
+
+
+```python Core Implementation
+def _invoke(
+ self,
+ model: str,
+ credentials: dict,
+ prompt_messages: list[PromptMessage],
+ model_parameters: dict,
+ tools: Optional[list[PromptMessageTool]] = None,
+ stop: Optional[list[str]] = None,
+ stream: bool = True,
+ user: Optional[str] = None
+) -> Union[LLMResult, Generator[LLMResultChunk, None, None]]:
+ """
+ Invoke the language model
+ """
+ # Prepare API parameters
+ api_params = self._prepare_api_parameters(
+ model,
+ credentials,
+ prompt_messages,
+ model_parameters,
+ tools,
+ stop
+ )
+
+ try:
+ # Choose between streaming and non-streaming implementation
+ if stream:
+ return self._invoke_stream(model, api_params, user)
+ else:
+ return self._invoke_sync(model, api_params, user)
+
+ except Exception as e:
+ # Map errors using the error mapping property
+ self._handle_api_error(e)
+
+# Helper methods for streaming and non-streaming calls
+def _invoke_stream(self, model, api_params, user):
+ # Implement streaming call and yield chunks
+ pass
+
+def _invoke_sync(self, model, api_params, user):
+ # Implement synchronous call and return complete result
+ pass
+```
+
+
+
+
+ Model identifier (e.g., "gpt-4", "claude-3")
+
+
+
+ Authentication credentials for the API
+
+
+
+ Message list in Dify's standardized format:
+ - For `completion` models: Include a single `UserPromptMessage`
+ - For `chat` models: Include `SystemPromptMessage`, `UserPromptMessage`, `AssistantPromptMessage`, `ToolPromptMessage` as needed
+
+
+
+ Model-specific parameters (temperature, top_p, etc.) as defined in the model's YAML configuration
+
+
+
+ Tool definitions for function calling capabilities
+
+
+
+ Stop sequences that will halt model generation when encountered
+
+
+
+ Whether to return a streaming response
+
+
+
+ User identifier for API monitoring
+
+
+
+
+
+ A generator yielding chunks of the response as they become available
+
+
+
+ A complete response object with the full generated text
+
+
+
+
+We recommend implementing separate helper methods for streaming and non-streaming calls to keep your code organized and maintainable.
+
+
+#### 2. Token Counting
+
+
+```python Implementation
+def get_num_tokens(
+ self,
+ model: str,
+ credentials: dict,
+ prompt_messages: list[PromptMessage],
+ tools: Optional[list[PromptMessageTool]] = None
+) -> int:
+ """
+ Calculate the number of tokens in the prompt
+ """
+ # Convert prompt_messages to the format expected by the tokenizer
+ text = self._convert_messages_to_text(prompt_messages)
+
+ try:
+ # Use the appropriate tokenizer for this model
+ tokenizer = self._get_tokenizer(model)
+ return len(tokenizer.encode(text))
+ except Exception:
+ # Fall back to a generic tokenizer
+ return self._get_num_tokens_by_gpt2(text)
+```
+
+
+
+If the model doesn't provide a tokenizer, you can use the base class's `_get_num_tokens_by_gpt2(text)` method for a reasonable approximation.
+
+
+#### 3. Custom Model Schema (Optional)
+
+
+```python Implementation
+def get_customizable_model_schema(
+ self,
+ model: str,
+ credentials: dict
+) -> Optional[AIModelEntity]:
+ """
+ Get parameter schema for custom models
+ """
+ # For fine-tuned models, you might return the base model's schema
+ if model.startswith("ft:"):
+ base_model = self._extract_base_model(model)
+ return self._get_predefined_model_schema(base_model)
+
+ # For standard models, return None to use the predefined schema
+ return None
+```
+
+
+
+This method is only necessary for providers that support custom models. It allows custom models to inherit parameter rules from base models.
+
+
+### TextEmbedding Implementation
+
+
+Text embedding models convert text into high-dimensional vectors that capture semantic meaning, which is useful for retrieval, similarity search, and classification.
+
+
+To implement a Text Embedding provider, inherit from the `__base.text_embedding_model.TextEmbeddingModel` base class:
+
+#### 1. Core Embedding Method
+
+
+```python Implementation
+def _invoke(
+ self,
+ model: str,
+ credentials: dict,
+ texts: list[str],
+ user: Optional[str] = None
+) -> TextEmbeddingResult:
+ """
+ Generate embedding vectors for multiple texts
+ """
+ # Set up API client with credentials
+ client = self._get_client(credentials)
+
+ # Handle batching if needed
+ batch_size = self._get_batch_size(model)
+ all_embeddings = []
+ total_tokens = 0
+ start_time = time.time()
+
+ # Process in batches to avoid API limits
+ for i in range(0, len(texts), batch_size):
+ batch = texts[i:i+batch_size]
+
+ # Make API call to the embeddings endpoint
+ response = client.embeddings.create(
+ model=model,
+ input=batch,
+ user=user
+ )
+
+ # Extract embeddings from response
+ batch_embeddings = [item.embedding for item in response.data]
+ all_embeddings.extend(batch_embeddings)
+
+ # Track token usage
+ total_tokens += response.usage.total_tokens
+
+ # Calculate usage metrics
+ elapsed_time = time.time() - start_time
+ usage = self._create_embedding_usage(
+ model=model,
+ tokens=total_tokens,
+ latency=elapsed_time
+ )
+
+ return TextEmbeddingResult(
+ model=model,
+ embeddings=all_embeddings,
+ usage=usage
+ )
+```
+
+
+
+
+ Embedding model identifier
+
+
+
+ Authentication credentials for the embedding service
+
+
+
+ List of text inputs to embed
+
+
+
+ User identifier for API monitoring
+
+
+
+
+
+ A structured response containing:
+ - model: The model used for embedding
+ - embeddings: List of embedding vectors corresponding to input texts
+ - usage: Metadata about token usage and costs
+
+
+
+#### 2. Token Counting Method
+
+
+```python Implementation
+def get_num_tokens(
+ self,
+ model: str,
+ credentials: dict,
+ texts: list[str]
+) -> int:
+ """
+ Calculate the number of tokens in the texts to be embedded
+ """
+ # Join all texts to estimate token count
+ combined_text = " ".join(texts)
+
+ try:
+ # Use the appropriate tokenizer for this model
+ tokenizer = self._get_tokenizer(model)
+ return len(tokenizer.encode(combined_text))
+ except Exception:
+ # Fall back to a generic tokenizer
+ return self._get_num_tokens_by_gpt2(combined_text)
+```
+
+
+
+For embedding models, accurate token counting is important for cost estimation, but not critical for functionality. The `_get_num_tokens_by_gpt2` method provides a reasonable approximation for most models.
+
+
+### Rerank Implementation
+
+
+Reranking models help improve search quality by re-ordering a set of candidate documents based on their relevance to a query, typically after an initial retrieval phase.
+
+
+To implement a Reranking provider, inherit from the `__base.rerank_model.RerankModel` base class:
+
+
+```python Implementation
+def _invoke(
+ self,
+ model: str,
+ credentials: dict,
+ query: str,
+ docs: list[str],
+ score_threshold: Optional[float] = None,
+ top_n: Optional[int] = None,
+ user: Optional[str] = None
+) -> RerankResult:
+ """
+ Rerank documents based on relevance to the query
+ """
+ # Set up API client with credentials
+ client = self._get_client(credentials)
+
+ # Prepare request data
+ request_data = {
+ "query": query,
+ "documents": docs,
+ }
+
+ # Call reranking API endpoint
+ response = client.rerank(
+ model=model,
+ **request_data,
+ user=user
+ )
+
+ # Process results
+ ranked_results = []
+ for i, result in enumerate(response.results):
+ # Create RerankDocument for each result
+ doc = RerankDocument(
+ index=result.document_index, # Original index in docs list
+ text=docs[result.document_index], # Original text
+ score=result.relevance_score # Relevance score
+ )
+ ranked_results.append(doc)
+
+ # Sort by score in descending order
+ ranked_results.sort(key=lambda x: x.score, reverse=True)
+
+ # Apply score threshold filtering if specified
+ if score_threshold is not None:
+ ranked_results = [doc for doc in ranked_results if doc.score >= score_threshold]
+
+ # Apply top_n limit if specified
+ if top_n is not None and top_n > 0:
+ ranked_results = ranked_results[:top_n]
+
+ return RerankResult(
+ model=model,
+ docs=ranked_results
+ )
+```
+
+
+
+
+ Reranking model identifier
+
+
+
+ Authentication credentials for the API
+
+
+
+ The search query text
+
+
+
+ List of document texts to be reranked
+
+
+
+ Optional minimum score threshold for filtering results
+
+
+
+ Optional limit on number of results to return
+
+
+
+ User identifier for API monitoring
+
+
+
+
+
+ A structured response containing:
+ - model: The model used for reranking
+ - docs: List of RerankDocument objects with index, text, and score
+
+
+
+
+Reranking can be computationally expensive, especially with large document sets. Implement batching for large document collections to avoid timeouts or excessive resource consumption.
+
+
+### Speech2Text Implementation
+
+
+Speech-to-text models convert spoken language from audio files into written text, enabling applications like transcription services, voice commands, and accessibility features.
+
+
+To implement a Speech-to-Text provider, inherit from the `__base.speech2text_model.Speech2TextModel` base class:
+
+
+```python Implementation
+def _invoke(
+ self,
+ model: str,
+ credentials: dict,
+ file: IO[bytes],
+ user: Optional[str] = None
+) -> str:
+ """
+ Convert speech audio to text
+ """
+ # Set up API client with credentials
+ client = self._get_client(credentials)
+
+ try:
+ # Determine the file format
+ file_format = self._detect_audio_format(file)
+
+ # Prepare the file for API submission
+ # Most APIs require either a file path or binary data
+ audio_data = file.read()
+
+ # Call the speech-to-text API
+ response = client.audio.transcriptions.create(
+ model=model,
+ file=("audio.mp3", audio_data), # Adjust filename based on actual format
+ user=user
+ )
+
+ # Extract and return the transcribed text
+ return response.text
+
+ except Exception as e:
+ # Map to appropriate error type
+ self._handle_api_error(e)
+
+ finally:
+ # Reset file pointer for potential reuse
+ file.seek(0)
```
-You can also directly throw corresponding Errors and define them as follows, so that in subsequent calls you can directly throw exceptions like `InvokeConnectionError`.
-
-#### LLM
-
-Inherit the `__base.large_language_model.LargeLanguageModel` base class and implement the following interface:
-
-* LLM Invocation
-
-Implement the core method for LLM invocation, which can support both streaming and synchronous responses.
-
-```python
-def _invoke(self, model: str, credentials: dict,
- prompt_messages: list[PromptMessage], model_parameters: dict,
- tools: Optional[list[PromptMessageTool]] = None, stop: Optional[list[str]] = None,
- stream: bool = True, user: Optional[str] = None) \
- -> Union[LLMResult, Generator]:
+```python Helper Methods
+def _detect_audio_format(self, file: IO[bytes]) -> str:
"""
- Invoke large language model
-
- :param model: model name
- :param credentials: model credentials
- :param prompt_messages: prompt messages
- :param model_parameters: model parameters
- :param tools: tools for tool calling
- :param stop: stop words
- :param stream: is stream response
- :param user: unique user id
- :return: full response or stream response chunk generator result
+ Detect the audio format based on file header
"""
+ # Read the first few bytes to check the file signature
+ header = file.read(12)
+ file.seek(0) # Reset file pointer
+
+ # Check for common audio format signatures
+ if header.startswith(b'RIFF') and header[8:12] == b'WAVE':
+ return 'wav'
+ elif header.startswith(b'ID3') or header.startswith(b'\xFF\xFB'):
+ return 'mp3'
+ elif header.startswith(b'OggS'):
+ return 'ogg'
+ elif header.startswith(b'fLaC'):
+ return 'flac'
+ else:
+ # Default or additional format checks
+ return 'mp3' # Default assumption
+```
+
+
+
+
+ Speech-to-text model identifier
+
+
+
+ Authentication credentials for the API
+
+
+
+ Binary file object containing the audio to transcribe
+
+
+
+ User identifier for API monitoring
+
+
+
+
+
+ The transcribed text from the audio file
+
+
+
+
+Audio format detection is important for proper handling of different file types. Consider implementing a helper method to detect the format from the file header as shown in the example.
+
+
+
+Some speech-to-text APIs have file size limitations. Consider implementing chunking for large audio files if necessary.
+
+
+### Text2Speech Implementation
+
+
+Text-to-speech models convert written text into natural-sounding speech, enabling applications such as voice assistants, screen readers, and audio content generation.
+
+
+To implement a Text-to-Speech provider, inherit from the `__base.text2speech_model.Text2SpeechModel` base class:
+
+
+```python Implementation
+def _invoke(
+ self,
+ model: str,
+ credentials: dict,
+ content_text: str,
+ streaming: bool,
+ user: Optional[str] = None
+) -> Union[bytes, Generator[bytes, None, None]]:
+ """
+ Convert text to speech audio
+ """
+ # Set up API client with credentials
+ client = self._get_client(credentials)
+
+ # Get voice settings based on model
+ voice = self._get_voice_for_model(model)
+
+ try:
+ # Choose implementation based on streaming preference
+ if streaming:
+ return self._stream_audio(
+ client=client,
+ model=model,
+ text=content_text,
+ voice=voice,
+ user=user
+ )
+ else:
+ return self._generate_complete_audio(
+ client=client,
+ model=model,
+ text=content_text,
+ voice=voice,
+ user=user
+ )
+ except Exception as e:
+ self._handle_api_error(e)
```
-* Parameters:
- * `model` (string) Model name
- * `credentials` (object) Credential information
-
-The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc.
-
-* `prompt_messages` (array\[[PromptMessage](#promptmessage)]) Prompt list
-
-If the model is of `Completion` type, the list only needs to include one [UserPromptMessage](#userpromptmessage) element; if the model is of `Chat` type, different messages need to be passed in as a list of [SystemPromptMessage](#systempromptmessage), [UserPromptMessage](#userpromptmessage), [AssistantPromptMessage](#assistantpromptmessage), [ToolPromptMessage](#toolpromptmessage) elements
-
-* `model_parameters` (object) Model parameters defined by the model YAML configuration's `parameter_rules`.
-
-* `tools` (array\[[PromptMessageTool](#promptmessagetool)]) \[optional] Tool list, equivalent to `function` in `function calling`. This is the tool list passed to tool calling.
-
-* `stop` (array\[string]) \[optional] Stop sequence. The model response will stop output before the string defined in the stop sequence.
-
-* `stream` (bool) Whether to stream output, default is True
-For streaming output, it returns Generator\[[LLMResultChunk](#llmresultchunk)], for non-streaming output, it returns [LLMResult](#llmresult).
-
-* `user` (string) \[optional] A unique identifier for the user that can help the provider monitor and detect abuse.
-
-* Return Value
-
-For streaming output, it returns Generator\[[LLMResultChunk](#llmresultchunk)], for non-streaming output, it returns [LLMResult](#llmresult).
-
-* Pre-calculate input tokens
-
-If the model does not provide a pre-calculation tokens interface, you can directly return 0.
-
-```python
-def get_num_tokens(self, model: str, credentials: dict, prompt_messages: list[PromptMessage],
- tools: Optional[list[PromptMessageTool]] = None) -> int:
+```python Helper Methods
+def _stream_audio(self, client, model, text, voice, user=None):
"""
- Get number of tokens for given prompt messages
-
- :param model: model name
- :param credentials: model credentials
- :param prompt_messages: prompt messages
- :param tools: tools for tool calling
- :return:
+ Implementation for streaming audio output
"""
+ # Make API request with stream=True
+ response = client.audio.speech.create(
+ model=model,
+ voice=voice,
+ input=text,
+ stream=True,
+ user=user
+ )
+
+ # Yield chunks as they arrive
+ for chunk in response:
+ if chunk:
+ yield chunk
+
+def _generate_complete_audio(self, client, model, text, voice, user=None):
+ """
+ Implementation for complete audio file generation
+ """
+ # Make API request for complete audio
+ response = client.audio.speech.create(
+ model=model,
+ voice=voice,
+ input=text,
+ user=user
+ )
+
+ # Get audio data as bytes
+ audio_data = response.content
+ return audio_data
+```
+
+
+
+
+ Text-to-speech model identifier
+
+
+
+ Authentication credentials for the API
+
+
+
+ Text content to be converted to speech
+
+
+
+ Whether to return streaming audio or complete file
+
+
+
+ User identifier for API monitoring
+
+
+
+
+
+ A generator yielding audio chunks as they become available
+
+
+
+ Complete audio data as bytes
+
+
+
+
+Most text-to-speech APIs require you to specify a voice along with the model. Consider implementing a mapping between Dify's model identifiers and the provider's voice options.
+
+
+
+Long text inputs may need to be chunked for better speech synthesis quality. Consider implementing text preprocessing to handle punctuation, numbers, and special characters properly.
+
+
+
+### Moderation Implementation
+
+
+Moderation models analyze content for potentially harmful, inappropriate, or unsafe material, helping maintain platform safety and content policies.
+
+
+To implement a Moderation provider, inherit from the `__base.moderation_model.ModerationModel` base class:
+
+
+```python Implementation
+def _invoke(
+ self,
+ model: str,
+ credentials: dict,
+ text: str,
+ user: Optional[str] = None
+) -> bool:
+ """
+ Analyze text for harmful content
+
+ Returns:
+ bool: False if the text is safe, True if it contains harmful content
+ """
+ # Set up API client with credentials
+ client = self._get_client(credentials)
+
+ try:
+ # Call moderation API
+ response = client.moderations.create(
+ model=model,
+ input=text,
+ user=user
+ )
+
+ # Check if any categories were flagged
+ result = response.results[0]
+
+ # Return True if flagged in any category, False if safe
+ return result.flagged
+
+ except Exception as e:
+ # Log the error but default to safe if there's an API issue
+ # This is a conservative approach - production systems might want
+ # different fallback behavior
+ logger.error(f"Moderation API error: {str(e)}")
+ return False
```
-Parameter explanations are the same as in `LLM Invocation` above. This interface needs to calculate based on the appropriate `tokenizer` for the corresponding `model`. If the corresponding model does not provide a `tokenizer`, you can use the `_get_num_tokens_by_gpt2(text: str)` method in the `AIModel` base class for calculation.
-
-* Get custom model rules [optional]
-
-```python
-def get_customizable_model_schema(self, model: str, credentials: dict) -> Optional[AIModelEntity]:
+```python Detailed Implementation
+def _invoke(
+ self,
+ model: str,
+ credentials: dict,
+ text: str,
+ user: Optional[str] = None
+) -> bool:
"""
- Get customizable model schema
-
- :param model: model name
- :param credentials: model credentials
- :return: model schema
+ Analyze text for harmful content with detailed category checking
"""
+ # Set up API client with credentials
+ client = self._get_client(credentials)
+
+ try:
+ # Call moderation API
+ response = client.moderations.create(
+ model=model,
+ input=text,
+ user=user
+ )
+
+ # Get detailed category results
+ result = response.results[0]
+ categories = result.categories
+
+ # Check specific categories based on your application's needs
+ # For example, you might want to flag certain categories but not others
+ critical_violations = [
+ categories.harassment,
+ categories.hate,
+ categories.self_harm,
+ categories.sexual,
+ categories.violence
+ ]
+
+ # Flag content if any critical category is violated
+ return any(critical_violations)
+
+ except Exception as e:
+ self._handle_api_error(e)
+ # Default to safe in case of error
+ return False
```
+
-When a provider supports adding custom LLMs, this method can be implemented to allow custom models to obtain model rules. By default, it returns None.
+
+
+ Moderation model identifier
+
+
+
+ Authentication credentials for the API
+
+
+
+ Text content to be analyzed
+
+
+
+ User identifier for API monitoring
+
+
-For most fine-tuned models under the `OpenAI` provider, the base model can be obtained through the fine-tuned model name, such as `gpt-3.5-turbo-1106`, and then return the predefined parameter rules of the base model. Refer to the specific implementation of [OpenAI](https://github.com/langgenius/dify-official-plugins/tree/main/models/openai).
+
+
+ Boolean indicating content safety:
+ - False: The content is safe
+ - True: The content contains harmful material
+
+
-#### TextEmbedding
+
+Moderation is often used as a safety mechanism. Consider the implications of false negatives (letting harmful content through) versus false positives (blocking safe content) when implementing your solution.
+
-Inherit the `__base.text_embedding_model.TextEmbeddingModel` base class and implement the following interface:
-
-* Embedding Invocation
-
-```python
-def _invoke(self, model: str, credentials: dict,
- texts: list[str], user: Optional[str] = None) \
- -> TextEmbeddingResult:
- """
- Invoke large language model
-
- :param model: model name
- :param credentials: model credentials
- :param texts: texts to embed
- :param user: unique user id
- :return: embeddings result
- """
-```
-
-* Parameters:
-
-* `model` (string) Model name
-* `credentials` (object) Credential information
-
-The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc.
-
-* `texts` (array\[string]) Text list, can be processed in batch
-* `user` (string) \[optional] A unique identifier for the user
-Can help the provider monitor and detect abuse.
-
-* Return:
-
-[TextEmbeddingResult](#textembeddingresult) entity.
-
-* Pre-calculate tokens
-
-```python
-def get_num_tokens(self, model: str, credentials: dict, texts: list[str]) -> int:
- """
- Get number of tokens for given prompt messages
-
- :param model: model name
- :param credentials: model credentials
- :param texts: texts to embed
- :return:
- """
-```
-
-Parameter explanations can be found in the `Embedding Invocation` section above.
-
-Similar to the `LargeLanguageModel` above, this interface needs to calculate based on the appropriate `tokenizer` for the corresponding `model`. If the corresponding model does not provide a `tokenizer`, you can use the `_get_num_tokens_by_gpt2(text: str)` method in the `AIModel` base class for calculation.
-
-#### Rerank
-
-Inherit the `__base.rerank_model.RerankModel` base class and implement the following interface:
-
-* Rerank Invocation
-
-```python
-def _invoke(self, model: str, credentials: dict,
- query: str, docs: list[str], score_threshold: Optional[float] = None, top_n: Optional[int] = None,
- user: Optional[str] = None) \
- -> RerankResult:
- """
- Invoke rerank model
-
- :param model: model name
- :param credentials: model credentials
- :param query: search query
- :param docs: docs for reranking
- :param score_threshold: score threshold
- :param top_n: top n
- :param user: unique user id
- :return: rerank result
- """
-```
-
-* Parameters:
-
-* `model` (string) Model name
-* `credentials` (object) Credential information
-The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc.
-* `query` (string) Query request content
-* `docs` (array\[string]) List of segments that need to be reranked
-* `score_threshold` (float) \[optional] Score threshold
-* `top_n` (int) \[optional] Take the top n segments
-* `user` (string) \[optional] A unique identifier for the user
-Can help the provider monitor and detect abuse.
-
-* Return:
-
-[RerankResult](#rerankresult) entity.
-
-#### Speech2text
-
-Inherit the `__base.speech2text_model.Speech2TextModel` base class and implement the following interface:
-
-* Invoke
-
-```python
-def _invoke(self, model: str, credentials: dict,
- file: IO[bytes], user: Optional[str] = None) \
- -> str:
- """
- Invoke large language model
-
- :param model: model name
- :param credentials: model credentials
- :param file: audio file
- :param user: unique user id
- :return: text for given audio file
- """
-```
-
-* Parameters:
-
-* `model` (string) Model name
-* `credentials` (object) Credential information
-The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc.
-* `file` (File) File stream
-* `user` (string) \[optional] A unique identifier for the user
-Can help the provider monitor and detect abuse.
-
-* Return:
-
-String after speech conversion.
-
-#### Text2speech
-
-Inherit the `__base.text2speech_model.Text2SpeechModel` base class and implement the following interface:
-
-* Invoke
-
-```python
-def _invoke(self, model: str, credentials: dict, content_text: str, streaming: bool, user: Optional[str] = None):
- """
- Invoke large language model
-
- :param model: model name
- :param credentials: model credentials
- :param content_text: text content to be translated
- :param streaming: output is streaming
- :param user: unique user id
- :return: translated audio file
- """
-```
-
-* Parameters:
-
-* `model` (string) Model name
-* `credentials` (object) Credential information
-The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc.
-* `content_text` (string) Text content to be converted
-* `streaming` (bool) Whether to stream output
-* `user` (string) \[optional] A unique identifier for the user
-Can help the provider monitor and detect abuse.
-
-* Return:
-
-Audio stream after text conversion.
-
-
-#### Moderation
-
-Inherit the `__base.moderation_model.ModerationModel` base class and implement the following interface:
-
-* Invoke
-
-```python
-def _invoke(self, model: str, credentials: dict,
- text: str, user: Optional[str] = None) \
- -> bool:
- """
- Invoke large language model
-
- :param model: model name
- :param credentials: model credentials
- :param text: text to moderate
- :param user: unique user id
- :return: false if text is safe, true otherwise
- """
-```
-
-* Parameters:
-
-* `model` (string) Model name
-* `credentials` (object) Credential information
-The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc.
-* `text` (string) Text content
-* `user` (string) \[optional] A unique identifier for the user
-Can help the provider monitor and detect abuse.
-
-* Return:
-
-False indicates the input text is safe, True indicates it is not.
+
+Many moderation APIs provide detailed category scores rather than just a binary result. Consider extending this implementation to return more detailed information about specific categories of harmful content if your application needs it.
+
### Entities