diff --git a/plugin-dev-en/0411-persistent-storage-kv.mdx b/plugin-dev-en/0411-persistent-storage-kv.mdx index cf09907e..8a79d7e0 100644 --- a/plugin-dev-en/0411-persistent-storage-kv.mdx +++ b/plugin-dev-en/0411-persistent-storage-kv.mdx @@ -7,63 +7,148 @@ dimensions: standard_title: Persistent Storage KV language: en title: Persistent Storage -description: This document introduces the persistent storage functionality in Dify - plugins, detailing how to use the KV database in plugins to store, retrieve, and - delete data. This feature enables plugins to persistently store data within the - same Workspace, meeting the needs for data preservation across sessions. +description: Learn how to implement persistent storage in your Dify plugins using the built-in key-value database to maintain state across interactions. --- -When examining Tools and Endpoints in plugins individually, it's not difficult to see that in most cases, they can only complete single-round interactions: request, return data, and the task ends. +## Overview -If there is data that needs to be stored long-term, such as implementing persistent memory, the plugin needs to have persistent storage capabilities. **The persistent storage mechanism allows plugins to have the ability to persistently store data within the same Workspace**. Currently, a KV database is provided to meet storage needs, and more flexible and powerful storage interfaces may be introduced in the future based on actual usage. +Most plugin tools and endpoints operate in a stateless, single-round interaction model: +1. Receive a request +2. Process data +3. Return a response +4. End the interaction -### Storing Keys +However, many real-world applications require maintaining state across multiple interactions. This is where **persistent storage** becomes essential. -#### **Entry Point** + +The persistent storage mechanism allows plugins to store data persistently within the same workspace, enabling stateful applications and memory features. + + +Dify currently provides a key-value (KV) storage system for plugins, with plans to introduce more flexible and powerful storage interfaces in the future based on developer needs. + +## Accessing Storage + +All storage operations are performed through the `storage` object available in your plugin's session: ```python - self.session.storage +# Access the storage interface +storage = self.session.storage ``` -#### **Interface** +## Storage Operations + +### Storing Data + +Store data with the `set` method: ```python - def set(self, key: str, val: bytes) -> None: - pass +def set(self, key: str, val: bytes) -> None: + """ + Store data in persistent storage + + Parameters: + key: Unique identifier for your data + val: Binary data to store (bytes) + """ + pass ``` -Note that what is passed in is bytes, so you can actually store files in it. + +The value must be in `bytes` format. This provides flexibility to store various types of data, including files. + -### Getting Keys - -#### **Entry Point** +#### Example: Storing Different Data Types ```python - self.session.storage +# String data (must convert to bytes) +storage.set("user_name", "John Doe".encode('utf-8')) + +# JSON data +import json +user_data = {"name": "John", "age": 30, "preferences": ["AI", "NLP"]} +storage.set("user_data", json.dumps(user_data).encode('utf-8')) + +# File data +with open("image.jpg", "rb") as f: + image_data = f.read() + storage.set("profile_image", image_data) ``` -#### **Interface** +### Retrieving Data + +Retrieve stored data with the `get` method: ```python - def get(self, key: str) -> bytes: - pass +def get(self, key: str) -> bytes: + """ + Retrieve data from persistent storage + + Parameters: + key: Unique identifier for your data + + Returns: + The stored data as bytes, or None if key doesn't exist + """ + pass ``` -### Deleting Keys - -#### **Entry Point** +#### Example: Retrieving and Converting Data ```python - self.session.storage +# Retrieving string data +name_bytes = storage.get("user_name") +if name_bytes: + name = name_bytes.decode('utf-8') + print(f"Retrieved name: {name}") + +# Retrieving JSON data +import json +user_data_bytes = storage.get("user_data") +if user_data_bytes: + user_data = json.loads(user_data_bytes.decode('utf-8')) + print(f"User preferences: {user_data['preferences']}") ``` -#### **Interface** +### Deleting Data + +Delete stored data with the `delete` method: ```python - def delete(self, key: str) -> None: - pass +def delete(self, key: str) -> None: + """ + Delete data from persistent storage + + Parameters: + key: Unique identifier for the data to delete + """ + pass ``` +## Best Practices + + + + Create a consistent naming scheme for your keys to avoid conflicts and make your code more maintainable. + + + Always check if data exists before processing it, as the key might not be found. + + + Convert complex objects to JSON or other serialized formats before storing. + + + Wrap storage operations in try/except blocks to handle potential errors gracefully. + + + +## Common Use Cases + +- **User Preferences**: Store user settings and preferences between sessions +- **Conversation History**: Maintain context from previous conversations +- **API Tokens**: Store authentication tokens securely +- **Cached Data**: Store frequently accessed data to reduce API calls +- **File Storage**: Store user-uploaded files or generated content + {/* Contributing Section DO NOT edit this section! diff --git a/plugin-dev-en/0412-model-schema.mdx b/plugin-dev-en/0412-model-schema.mdx index 71a04720..a8aa980e 100644 --- a/plugin-dev-en/0412-model-schema.mdx +++ b/plugin-dev-en/0412-model-schema.mdx @@ -7,386 +7,947 @@ dimensions: standard_title: Model Schema language: en title: Model API Interface -description: This document provides detailed interface specifications required for - Dify model plugin development, including model provider implementation, interface - definitions for five model types (LLM, TextEmbedding, Rerank, Speech2text, Text2speech), - and complete specifications for related data structures such as PromptMessage and - LLMResult. The document serves as a development reference for developers implementing - various model integrations. +description: Comprehensive guide to the Dify model plugin API including implementation requirements for LLM, TextEmbedding, Rerank, Speech2text, and Text2speech models, with detailed specifications for all related data structures. --- -This section introduces the interface methods and parameter descriptions that providers and each model type need to implement. Before developing a model plugin, you may first need to read [Model Design Rules](/plugin-dev-en/0411-model-designing-rules) and [Model Plugin Introduction](/plugin-dev-en/0131-model-plugin-introduction). +## Introduction -### Model Provider +This document details the interfaces and data structures required to implement Dify model plugins. It serves as a technical reference for developers integrating AI models with the Dify platform. -Inherit the `__base.model_provider.ModelProvider` base class and implement the following interface: + +Before diving into this API reference, we recommend first reading the [Model Design Rules](/plugin-dev-en/0411-model-designing-rules) and [Model Plugin Introduction](/plugin-dev-en/0131-model-plugin-introduction) for conceptual understanding. + -```python + + + Learn how to implement model provider classes for different AI service providers + + + Implementation details for the five supported model types: LLM, Embedding, Rerank, Speech2Text, and Text2Speech + + + Comprehensive reference for all data structures used in the model API + + + Guidelines for proper error mapping and exception handling + + + +## Model Provider + +Every model provider must inherit from the `__base.model_provider.ModelProvider` base class and implement the credential validation interface. + +### Provider Credential Validation + + +```python Core Implementation def validate_provider_credentials(self, credentials: dict) -> None: """ - Validate provider credentials - You can choose any validate_credentials method of model type or implement validate method by yourself, - such as: get model list api - - if validate failed, raise exception - - :param credentials: provider credentials, credentials form defined in `provider_credential_schema`. + Validate provider credentials by making a test API call + + Parameters: + credentials: Provider credentials as defined in `provider_credential_schema` + + Raises: + CredentialsValidateFailedError: If validation fails """ + try: + # Example implementation - validate using an LLM model instance + model_instance = self.get_model_instance(ModelType.LLM) + model_instance.validate_credentials( + model="example-model", + credentials=credentials + ) + except Exception as ex: + logger.exception(f"Credential validation failed") + raise CredentialsValidateFailedError(f"Invalid credentials: {str(ex)}") ``` -* `credentials` (object) Credential information - -The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema`, passed in as `api_key`, etc. If validation fails, please throw a `errors.validate.CredentialsValidateFailedError` error. **Note: Predefined models need to fully implement this interface, while custom model providers only need to implement it simply as follows:** - -```python +```python Custom Model Provider class XinferenceProvider(Provider): def validate_provider_credentials(self, credentials: dict) -> None: + """ + For custom-only model providers, a simple implementation is sufficient + as validation happens at the model level + """ pass ``` + -### Models + + Credential information as defined in the provider's YAML configuration under `provider_credential_schema`. + Typically includes fields like `api_key`, `organization_id`, etc. + -Models are divided into 5 different types, with different base classes to inherit from and different methods to implement for each type. + +If validation fails, your implementation must raise a `CredentialsValidateFailedError` exception. This ensures proper error handling in the Dify UI. + -#### Common Interfaces + +For predefined model providers, you should implement a thorough validation method that verifies the credentials work with your API. For custom model providers (where each model has its own credentials), a simplified implementation is sufficient. + -All models need to implement the following 2 methods consistently: +## Models -* Model credential validation +Dify supports five distinct model types, each requiring implementation of specific interfaces. However, all model types share some common requirements. -Similar to provider credential validation, this validates individual models. +### Common Interfaces -```python +Every model implementation, regardless of type, must implement these two fundamental methods: + +#### 1. Model Credential Validation + + +```python Implementation def validate_credentials(self, model: str, credentials: dict) -> None: """ - Validate model credentials - - :param model: model name - :param credentials: model credentials - :return: + Validate that the provided credentials work with the specified model + + Parameters: + model: The specific model identifier (e.g., "gpt-4") + credentials: Authentication details for the model + + Raises: + CredentialsValidateFailedError: If validation fails """ + try: + # Make a lightweight API call to verify credentials + # Example: List available models or check account status + response = self._api_client.validate_api_key(credentials["api_key"]) + + # Verify the specific model is available if applicable + if model not in response.get("available_models", []): + raise CredentialsValidateFailedError(f"Model {model} is not available") + + except ApiException as e: + raise CredentialsValidateFailedError(str(e)) ``` + -Parameters: + + The specific model identifier to validate (e.g., "gpt-4", "claude-3-opus") + -* `model` (string) Model name -* `credentials` (object) Credential information + + Credential information as defined in the provider's configuration + -The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc. If validation fails, please throw a `errors.validate.CredentialsValidateFailedError` error. +#### 2. Error Mapping -* Invocation error mapping table - -When a model invocation exception occurs, it needs to be mapped to a specified `InvokeError` type in Runtime, which helps Dify handle different errors differently. Runtime Errors: - -* `InvokeConnectionError` Connection error during invocation -* `InvokeServerUnavailableError` Service provider unavailable -* `InvokeRateLimitError` Rate limit reached -* `InvokeAuthorizationError` Authentication failed -* `InvokeBadRequestError` Incorrect parameters passed - -```python + +```python Implementation @property def _invoke_error_mapping(self) -> dict[type[InvokeError], list[type[Exception]]]: """ - Map model invoke error to unified error - The key is the error type thrown to the caller - The value is the error type thrown by the model, - which needs to be converted into a unified error type for the caller. - - :return: Invoke error mapping + Map provider-specific exceptions to standardized Dify error types + + Returns: + Dictionary mapping Dify error types to lists of provider exception types """ + return { + InvokeConnectionError: [ + requests.exceptions.ConnectionError, + requests.exceptions.Timeout, + ConnectionRefusedError + ], + InvokeServerUnavailableError: [ + ServiceUnavailableError, + HTTPStatusError + ], + InvokeRateLimitError: [ + RateLimitExceededError, + QuotaExceededError + ], + InvokeAuthorizationError: [ + AuthenticationError, + InvalidAPIKeyError, + PermissionDeniedError + ], + InvokeBadRequestError: [ + InvalidRequestError, + ValidationError + ] + } +``` + + + + + Network connection failures, timeouts + + + Service provider is down or unavailable + + + Rate limits or quota limits reached + + + Authentication or permission issues + + + Invalid parameters or requests + + + + +You can alternatively raise these standardized error types directly in your code instead of relying on the error mapping. This approach gives you more control over error messages. + + +### LLM Implementation + +To implement a Large Language Model provider, inherit from the `__base.large_language_model.LargeLanguageModel` base class and implement these methods: + +#### 1. Model Invocation + +This core method handles both streaming and non-streaming API calls to language models. + + +```python Core Implementation +def _invoke( + self, + model: str, + credentials: dict, + prompt_messages: list[PromptMessage], + model_parameters: dict, + tools: Optional[list[PromptMessageTool]] = None, + stop: Optional[list[str]] = None, + stream: bool = True, + user: Optional[str] = None +) -> Union[LLMResult, Generator[LLMResultChunk, None, None]]: + """ + Invoke the language model + """ + # Prepare API parameters + api_params = self._prepare_api_parameters( + model, + credentials, + prompt_messages, + model_parameters, + tools, + stop + ) + + try: + # Choose between streaming and non-streaming implementation + if stream: + return self._invoke_stream(model, api_params, user) + else: + return self._invoke_sync(model, api_params, user) + + except Exception as e: + # Map errors using the error mapping property + self._handle_api_error(e) + +# Helper methods for streaming and non-streaming calls +def _invoke_stream(self, model, api_params, user): + # Implement streaming call and yield chunks + pass + +def _invoke_sync(self, model, api_params, user): + # Implement synchronous call and return complete result + pass +``` + + + + + Model identifier (e.g., "gpt-4", "claude-3") + + + + Authentication credentials for the API + + + + Message list in Dify's standardized format: + - For `completion` models: Include a single `UserPromptMessage` + - For `chat` models: Include `SystemPromptMessage`, `UserPromptMessage`, `AssistantPromptMessage`, `ToolPromptMessage` as needed + + + + Model-specific parameters (temperature, top_p, etc.) as defined in the model's YAML configuration + + + + Tool definitions for function calling capabilities + + + + Stop sequences that will halt model generation when encountered + + + + Whether to return a streaming response + + + + User identifier for API monitoring + + + + + + A generator yielding chunks of the response as they become available + + + + A complete response object with the full generated text + + + + +We recommend implementing separate helper methods for streaming and non-streaming calls to keep your code organized and maintainable. + + +#### 2. Token Counting + + +```python Implementation +def get_num_tokens( + self, + model: str, + credentials: dict, + prompt_messages: list[PromptMessage], + tools: Optional[list[PromptMessageTool]] = None +) -> int: + """ + Calculate the number of tokens in the prompt + """ + # Convert prompt_messages to the format expected by the tokenizer + text = self._convert_messages_to_text(prompt_messages) + + try: + # Use the appropriate tokenizer for this model + tokenizer = self._get_tokenizer(model) + return len(tokenizer.encode(text)) + except Exception: + # Fall back to a generic tokenizer + return self._get_num_tokens_by_gpt2(text) +``` + + + +If the model doesn't provide a tokenizer, you can use the base class's `_get_num_tokens_by_gpt2(text)` method for a reasonable approximation. + + +#### 3. Custom Model Schema (Optional) + + +```python Implementation +def get_customizable_model_schema( + self, + model: str, + credentials: dict +) -> Optional[AIModelEntity]: + """ + Get parameter schema for custom models + """ + # For fine-tuned models, you might return the base model's schema + if model.startswith("ft:"): + base_model = self._extract_base_model(model) + return self._get_predefined_model_schema(base_model) + + # For standard models, return None to use the predefined schema + return None +``` + + + +This method is only necessary for providers that support custom models. It allows custom models to inherit parameter rules from base models. + + +### TextEmbedding Implementation + + +Text embedding models convert text into high-dimensional vectors that capture semantic meaning, which is useful for retrieval, similarity search, and classification. + + +To implement a Text Embedding provider, inherit from the `__base.text_embedding_model.TextEmbeddingModel` base class: + +#### 1. Core Embedding Method + + +```python Implementation +def _invoke( + self, + model: str, + credentials: dict, + texts: list[str], + user: Optional[str] = None +) -> TextEmbeddingResult: + """ + Generate embedding vectors for multiple texts + """ + # Set up API client with credentials + client = self._get_client(credentials) + + # Handle batching if needed + batch_size = self._get_batch_size(model) + all_embeddings = [] + total_tokens = 0 + start_time = time.time() + + # Process in batches to avoid API limits + for i in range(0, len(texts), batch_size): + batch = texts[i:i+batch_size] + + # Make API call to the embeddings endpoint + response = client.embeddings.create( + model=model, + input=batch, + user=user + ) + + # Extract embeddings from response + batch_embeddings = [item.embedding for item in response.data] + all_embeddings.extend(batch_embeddings) + + # Track token usage + total_tokens += response.usage.total_tokens + + # Calculate usage metrics + elapsed_time = time.time() - start_time + usage = self._create_embedding_usage( + model=model, + tokens=total_tokens, + latency=elapsed_time + ) + + return TextEmbeddingResult( + model=model, + embeddings=all_embeddings, + usage=usage + ) +``` + + + + + Embedding model identifier + + + + Authentication credentials for the embedding service + + + + List of text inputs to embed + + + + User identifier for API monitoring + + + + + + A structured response containing: + - model: The model used for embedding + - embeddings: List of embedding vectors corresponding to input texts + - usage: Metadata about token usage and costs + + + +#### 2. Token Counting Method + + +```python Implementation +def get_num_tokens( + self, + model: str, + credentials: dict, + texts: list[str] +) -> int: + """ + Calculate the number of tokens in the texts to be embedded + """ + # Join all texts to estimate token count + combined_text = " ".join(texts) + + try: + # Use the appropriate tokenizer for this model + tokenizer = self._get_tokenizer(model) + return len(tokenizer.encode(combined_text)) + except Exception: + # Fall back to a generic tokenizer + return self._get_num_tokens_by_gpt2(combined_text) +``` + + + +For embedding models, accurate token counting is important for cost estimation, but not critical for functionality. The `_get_num_tokens_by_gpt2` method provides a reasonable approximation for most models. + + +### Rerank Implementation + + +Reranking models help improve search quality by re-ordering a set of candidate documents based on their relevance to a query, typically after an initial retrieval phase. + + +To implement a Reranking provider, inherit from the `__base.rerank_model.RerankModel` base class: + + +```python Implementation +def _invoke( + self, + model: str, + credentials: dict, + query: str, + docs: list[str], + score_threshold: Optional[float] = None, + top_n: Optional[int] = None, + user: Optional[str] = None +) -> RerankResult: + """ + Rerank documents based on relevance to the query + """ + # Set up API client with credentials + client = self._get_client(credentials) + + # Prepare request data + request_data = { + "query": query, + "documents": docs, + } + + # Call reranking API endpoint + response = client.rerank( + model=model, + **request_data, + user=user + ) + + # Process results + ranked_results = [] + for i, result in enumerate(response.results): + # Create RerankDocument for each result + doc = RerankDocument( + index=result.document_index, # Original index in docs list + text=docs[result.document_index], # Original text + score=result.relevance_score # Relevance score + ) + ranked_results.append(doc) + + # Sort by score in descending order + ranked_results.sort(key=lambda x: x.score, reverse=True) + + # Apply score threshold filtering if specified + if score_threshold is not None: + ranked_results = [doc for doc in ranked_results if doc.score >= score_threshold] + + # Apply top_n limit if specified + if top_n is not None and top_n > 0: + ranked_results = ranked_results[:top_n] + + return RerankResult( + model=model, + docs=ranked_results + ) +``` + + + + + Reranking model identifier + + + + Authentication credentials for the API + + + + The search query text + + + + List of document texts to be reranked + + + + Optional minimum score threshold for filtering results + + + + Optional limit on number of results to return + + + + User identifier for API monitoring + + + + + + A structured response containing: + - model: The model used for reranking + - docs: List of RerankDocument objects with index, text, and score + + + + +Reranking can be computationally expensive, especially with large document sets. Implement batching for large document collections to avoid timeouts or excessive resource consumption. + + +### Speech2Text Implementation + + +Speech-to-text models convert spoken language from audio files into written text, enabling applications like transcription services, voice commands, and accessibility features. + + +To implement a Speech-to-Text provider, inherit from the `__base.speech2text_model.Speech2TextModel` base class: + + +```python Implementation +def _invoke( + self, + model: str, + credentials: dict, + file: IO[bytes], + user: Optional[str] = None +) -> str: + """ + Convert speech audio to text + """ + # Set up API client with credentials + client = self._get_client(credentials) + + try: + # Determine the file format + file_format = self._detect_audio_format(file) + + # Prepare the file for API submission + # Most APIs require either a file path or binary data + audio_data = file.read() + + # Call the speech-to-text API + response = client.audio.transcriptions.create( + model=model, + file=("audio.mp3", audio_data), # Adjust filename based on actual format + user=user + ) + + # Extract and return the transcribed text + return response.text + + except Exception as e: + # Map to appropriate error type + self._handle_api_error(e) + + finally: + # Reset file pointer for potential reuse + file.seek(0) ``` -You can also directly throw corresponding Errors and define them as follows, so that in subsequent calls you can directly throw exceptions like `InvokeConnectionError`. - -#### LLM - -Inherit the `__base.large_language_model.LargeLanguageModel` base class and implement the following interface: - -* LLM Invocation - -Implement the core method for LLM invocation, which can support both streaming and synchronous responses. - -```python -def _invoke(self, model: str, credentials: dict, - prompt_messages: list[PromptMessage], model_parameters: dict, - tools: Optional[list[PromptMessageTool]] = None, stop: Optional[list[str]] = None, - stream: bool = True, user: Optional[str] = None) \ - -> Union[LLMResult, Generator]: +```python Helper Methods +def _detect_audio_format(self, file: IO[bytes]) -> str: """ - Invoke large language model - - :param model: model name - :param credentials: model credentials - :param prompt_messages: prompt messages - :param model_parameters: model parameters - :param tools: tools for tool calling - :param stop: stop words - :param stream: is stream response - :param user: unique user id - :return: full response or stream response chunk generator result + Detect the audio format based on file header """ + # Read the first few bytes to check the file signature + header = file.read(12) + file.seek(0) # Reset file pointer + + # Check for common audio format signatures + if header.startswith(b'RIFF') and header[8:12] == b'WAVE': + return 'wav' + elif header.startswith(b'ID3') or header.startswith(b'\xFF\xFB'): + return 'mp3' + elif header.startswith(b'OggS'): + return 'ogg' + elif header.startswith(b'fLaC'): + return 'flac' + else: + # Default or additional format checks + return 'mp3' # Default assumption +``` + + + + + Speech-to-text model identifier + + + + Authentication credentials for the API + + + + Binary file object containing the audio to transcribe + + + + User identifier for API monitoring + + + + + + The transcribed text from the audio file + + + + +Audio format detection is important for proper handling of different file types. Consider implementing a helper method to detect the format from the file header as shown in the example. + + + +Some speech-to-text APIs have file size limitations. Consider implementing chunking for large audio files if necessary. + + +### Text2Speech Implementation + + +Text-to-speech models convert written text into natural-sounding speech, enabling applications such as voice assistants, screen readers, and audio content generation. + + +To implement a Text-to-Speech provider, inherit from the `__base.text2speech_model.Text2SpeechModel` base class: + + +```python Implementation +def _invoke( + self, + model: str, + credentials: dict, + content_text: str, + streaming: bool, + user: Optional[str] = None +) -> Union[bytes, Generator[bytes, None, None]]: + """ + Convert text to speech audio + """ + # Set up API client with credentials + client = self._get_client(credentials) + + # Get voice settings based on model + voice = self._get_voice_for_model(model) + + try: + # Choose implementation based on streaming preference + if streaming: + return self._stream_audio( + client=client, + model=model, + text=content_text, + voice=voice, + user=user + ) + else: + return self._generate_complete_audio( + client=client, + model=model, + text=content_text, + voice=voice, + user=user + ) + except Exception as e: + self._handle_api_error(e) ``` -* Parameters: - * `model` (string) Model name - * `credentials` (object) Credential information - -The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc. - -* `prompt_messages` (array\[[PromptMessage](#promptmessage)]) Prompt list - -If the model is of `Completion` type, the list only needs to include one [UserPromptMessage](#userpromptmessage) element; if the model is of `Chat` type, different messages need to be passed in as a list of [SystemPromptMessage](#systempromptmessage), [UserPromptMessage](#userpromptmessage), [AssistantPromptMessage](#assistantpromptmessage), [ToolPromptMessage](#toolpromptmessage) elements - -* `model_parameters` (object) Model parameters defined by the model YAML configuration's `parameter_rules`. - -* `tools` (array\[[PromptMessageTool](#promptmessagetool)]) \[optional] Tool list, equivalent to `function` in `function calling`. This is the tool list passed to tool calling. - -* `stop` (array\[string]) \[optional] Stop sequence. The model response will stop output before the string defined in the stop sequence. - -* `stream` (bool) Whether to stream output, default is True -For streaming output, it returns Generator\[[LLMResultChunk](#llmresultchunk)], for non-streaming output, it returns [LLMResult](#llmresult). - -* `user` (string) \[optional] A unique identifier for the user that can help the provider monitor and detect abuse. - -* Return Value - -For streaming output, it returns Generator\[[LLMResultChunk](#llmresultchunk)], for non-streaming output, it returns [LLMResult](#llmresult). - -* Pre-calculate input tokens - -If the model does not provide a pre-calculation tokens interface, you can directly return 0. - -```python -def get_num_tokens(self, model: str, credentials: dict, prompt_messages: list[PromptMessage], - tools: Optional[list[PromptMessageTool]] = None) -> int: +```python Helper Methods +def _stream_audio(self, client, model, text, voice, user=None): """ - Get number of tokens for given prompt messages - - :param model: model name - :param credentials: model credentials - :param prompt_messages: prompt messages - :param tools: tools for tool calling - :return: + Implementation for streaming audio output """ + # Make API request with stream=True + response = client.audio.speech.create( + model=model, + voice=voice, + input=text, + stream=True, + user=user + ) + + # Yield chunks as they arrive + for chunk in response: + if chunk: + yield chunk + +def _generate_complete_audio(self, client, model, text, voice, user=None): + """ + Implementation for complete audio file generation + """ + # Make API request for complete audio + response = client.audio.speech.create( + model=model, + voice=voice, + input=text, + user=user + ) + + # Get audio data as bytes + audio_data = response.content + return audio_data +``` + + + + + Text-to-speech model identifier + + + + Authentication credentials for the API + + + + Text content to be converted to speech + + + + Whether to return streaming audio or complete file + + + + User identifier for API monitoring + + + + + + A generator yielding audio chunks as they become available + + + + Complete audio data as bytes + + + + +Most text-to-speech APIs require you to specify a voice along with the model. Consider implementing a mapping between Dify's model identifiers and the provider's voice options. + + + +Long text inputs may need to be chunked for better speech synthesis quality. Consider implementing text preprocessing to handle punctuation, numbers, and special characters properly. + + + +### Moderation Implementation + + +Moderation models analyze content for potentially harmful, inappropriate, or unsafe material, helping maintain platform safety and content policies. + + +To implement a Moderation provider, inherit from the `__base.moderation_model.ModerationModel` base class: + + +```python Implementation +def _invoke( + self, + model: str, + credentials: dict, + text: str, + user: Optional[str] = None +) -> bool: + """ + Analyze text for harmful content + + Returns: + bool: False if the text is safe, True if it contains harmful content + """ + # Set up API client with credentials + client = self._get_client(credentials) + + try: + # Call moderation API + response = client.moderations.create( + model=model, + input=text, + user=user + ) + + # Check if any categories were flagged + result = response.results[0] + + # Return True if flagged in any category, False if safe + return result.flagged + + except Exception as e: + # Log the error but default to safe if there's an API issue + # This is a conservative approach - production systems might want + # different fallback behavior + logger.error(f"Moderation API error: {str(e)}") + return False ``` -Parameter explanations are the same as in `LLM Invocation` above. This interface needs to calculate based on the appropriate `tokenizer` for the corresponding `model`. If the corresponding model does not provide a `tokenizer`, you can use the `_get_num_tokens_by_gpt2(text: str)` method in the `AIModel` base class for calculation. - -* Get custom model rules [optional] - -```python -def get_customizable_model_schema(self, model: str, credentials: dict) -> Optional[AIModelEntity]: +```python Detailed Implementation +def _invoke( + self, + model: str, + credentials: dict, + text: str, + user: Optional[str] = None +) -> bool: """ - Get customizable model schema - - :param model: model name - :param credentials: model credentials - :return: model schema + Analyze text for harmful content with detailed category checking """ + # Set up API client with credentials + client = self._get_client(credentials) + + try: + # Call moderation API + response = client.moderations.create( + model=model, + input=text, + user=user + ) + + # Get detailed category results + result = response.results[0] + categories = result.categories + + # Check specific categories based on your application's needs + # For example, you might want to flag certain categories but not others + critical_violations = [ + categories.harassment, + categories.hate, + categories.self_harm, + categories.sexual, + categories.violence + ] + + # Flag content if any critical category is violated + return any(critical_violations) + + except Exception as e: + self._handle_api_error(e) + # Default to safe in case of error + return False ``` + -When a provider supports adding custom LLMs, this method can be implemented to allow custom models to obtain model rules. By default, it returns None. + + + Moderation model identifier + + + + Authentication credentials for the API + + + + Text content to be analyzed + + + + User identifier for API monitoring + + -For most fine-tuned models under the `OpenAI` provider, the base model can be obtained through the fine-tuned model name, such as `gpt-3.5-turbo-1106`, and then return the predefined parameter rules of the base model. Refer to the specific implementation of [OpenAI](https://github.com/langgenius/dify-official-plugins/tree/main/models/openai). + + + Boolean indicating content safety: + - False: The content is safe + - True: The content contains harmful material + + -#### TextEmbedding + +Moderation is often used as a safety mechanism. Consider the implications of false negatives (letting harmful content through) versus false positives (blocking safe content) when implementing your solution. + -Inherit the `__base.text_embedding_model.TextEmbeddingModel` base class and implement the following interface: - -* Embedding Invocation - -```python -def _invoke(self, model: str, credentials: dict, - texts: list[str], user: Optional[str] = None) \ - -> TextEmbeddingResult: - """ - Invoke large language model - - :param model: model name - :param credentials: model credentials - :param texts: texts to embed - :param user: unique user id - :return: embeddings result - """ -``` - -* Parameters: - -* `model` (string) Model name -* `credentials` (object) Credential information - -The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc. - -* `texts` (array\[string]) Text list, can be processed in batch -* `user` (string) \[optional] A unique identifier for the user -Can help the provider monitor and detect abuse. - -* Return: - -[TextEmbeddingResult](#textembeddingresult) entity. - -* Pre-calculate tokens - -```python -def get_num_tokens(self, model: str, credentials: dict, texts: list[str]) -> int: - """ - Get number of tokens for given prompt messages - - :param model: model name - :param credentials: model credentials - :param texts: texts to embed - :return: - """ -``` - -Parameter explanations can be found in the `Embedding Invocation` section above. - -Similar to the `LargeLanguageModel` above, this interface needs to calculate based on the appropriate `tokenizer` for the corresponding `model`. If the corresponding model does not provide a `tokenizer`, you can use the `_get_num_tokens_by_gpt2(text: str)` method in the `AIModel` base class for calculation. - -#### Rerank - -Inherit the `__base.rerank_model.RerankModel` base class and implement the following interface: - -* Rerank Invocation - -```python -def _invoke(self, model: str, credentials: dict, - query: str, docs: list[str], score_threshold: Optional[float] = None, top_n: Optional[int] = None, - user: Optional[str] = None) \ - -> RerankResult: - """ - Invoke rerank model - - :param model: model name - :param credentials: model credentials - :param query: search query - :param docs: docs for reranking - :param score_threshold: score threshold - :param top_n: top n - :param user: unique user id - :return: rerank result - """ -``` - -* Parameters: - -* `model` (string) Model name -* `credentials` (object) Credential information -The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc. -* `query` (string) Query request content -* `docs` (array\[string]) List of segments that need to be reranked -* `score_threshold` (float) \[optional] Score threshold -* `top_n` (int) \[optional] Take the top n segments -* `user` (string) \[optional] A unique identifier for the user -Can help the provider monitor and detect abuse. - -* Return: - -[RerankResult](#rerankresult) entity. - -#### Speech2text - -Inherit the `__base.speech2text_model.Speech2TextModel` base class and implement the following interface: - -* Invoke - -```python -def _invoke(self, model: str, credentials: dict, - file: IO[bytes], user: Optional[str] = None) \ - -> str: - """ - Invoke large language model - - :param model: model name - :param credentials: model credentials - :param file: audio file - :param user: unique user id - :return: text for given audio file - """ -``` - -* Parameters: - -* `model` (string) Model name -* `credentials` (object) Credential information -The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc. -* `file` (File) File stream -* `user` (string) \[optional] A unique identifier for the user -Can help the provider monitor and detect abuse. - -* Return: - -String after speech conversion. - -#### Text2speech - -Inherit the `__base.text2speech_model.Text2SpeechModel` base class and implement the following interface: - -* Invoke - -```python -def _invoke(self, model: str, credentials: dict, content_text: str, streaming: bool, user: Optional[str] = None): - """ - Invoke large language model - - :param model: model name - :param credentials: model credentials - :param content_text: text content to be translated - :param streaming: output is streaming - :param user: unique user id - :return: translated audio file - """ -``` - -* Parameters: - -* `model` (string) Model name -* `credentials` (object) Credential information -The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc. -* `content_text` (string) Text content to be converted -* `streaming` (bool) Whether to stream output -* `user` (string) \[optional] A unique identifier for the user -Can help the provider monitor and detect abuse. - -* Return: - -Audio stream after text conversion. - - -#### Moderation - -Inherit the `__base.moderation_model.ModerationModel` base class and implement the following interface: - -* Invoke - -```python -def _invoke(self, model: str, credentials: dict, - text: str, user: Optional[str] = None) \ - -> bool: - """ - Invoke large language model - - :param model: model name - :param credentials: model credentials - :param text: text to moderate - :param user: unique user id - :return: false if text is safe, true otherwise - """ -``` - -* Parameters: - -* `model` (string) Model name -* `credentials` (object) Credential information -The credential parameters are defined by the provider YAML configuration file's `provider_credential_schema` or `model_credential_schema`, passed in as `api_key`, etc. -* `text` (string) Text content -* `user` (string) \[optional] A unique identifier for the user -Can help the provider monitor and detect abuse. - -* Return: - -False indicates the input text is safe, True indicates it is not. + +Many moderation APIs provide detailed category scores rather than just a binary result. Consider extending this implementation to return more detailed information about specific categories of harmful content if your application needs it. + ### Entities