Merge pull request #11907 from nextcloud/fix/ai/small-additions2

fix(admin/AI): Small additions
2026-01-02 17:59:36 +07:00 · 2024-06-24 12:44:12 +02:00
parent b8450e7378 fe598387c3
commit 294e6067f8
3 changed files with 22 additions and 10 deletions
--- a/admin_manual/ai/app_context_chat.rst
+++ b/admin_manual/ai/app_context_chat.rst
@@ -13,6 +13,8 @@ Together they provide the ContextChat text processing tasks accessible via the :

 The *context_chat* and *context_chat_backend* apps run only open source models and do so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.

+This app supports input and output in languages other than English if the language model supports the language.
+
 Requirements
 ------------

--- a/admin_manual/ai/app_llm2.rst
+++ b/admin_manual/ai/app_llm2.rst
@@ -4,17 +4,19 @@ App: Local large language model (llm2)

 .. _ai-app-llm2:

-The *llm2* app is one of the apps that provide text processing functionality using Large language models in Nextcloud and act as a text processing backend for the :ref:`Nextcloud Assistant app<ai-app-assistant>`, the *mail* app and :ref:`other apps making use of the core Translation API<tp-consumer-apps>`. The *llm2* app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
+The *llm2* app is one of the apps that provide text processing functionality using Large language models in Nextcloud and act as a text processing backend for the :ref:`Nextcloud Assistant app<ai-app-assistant>`, the *mail* app and :ref:`other apps making use of the core Text Processing API<tp-consumer-apps>`. The *llm2* app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.

 This app uses `ctransformers <https://github.com/marella/ctransformers>`_ under the hood and is thus compatible with any model in *gguf* format. Output quality will differ depending on which model you use, we recommend the following models:

-* `Llama2 7b Chat <https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF>`_ (Slightly older; good quality; good acclaim)
-* `NeuralBeagle14 7B <https://huggingface.co/mlabonne/NeuralBeagle14-7B-GGUF>`_ (Newer; good quality; less well known)
+* `Llama3 8b Instruct <https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF>`_ (reasonable quality; fast; good acclaim; multilingual output may not be optimal)
+* `Llama3 70B Instruct <https://huggingface.co/QuantFactory/Meta-Llama-3-70B-Instruct-GGUF>`_ (good quality; good acclaim; good multilingual output)
+
+This app supports input and output in languages other than English if the underlying model supports the language.

 Requirements
 ------------

-* This app is built as an External App and thus depends on AppAPI v2.3.0
+* This app is built as an External App and thus depends on AppAPI v2.3.0 or higher
 * Nextcloud AIO is supported
 * Using GPU processing is supported, but not required; be prepared for slow performance unless you are using GPU
 * We currently only support NVIDIA GPUs
@@ -22,9 +24,14 @@ Requirements

   * You will need a GPU with enough VRAM to hold the model you choose

-      * for 7B parameter models, 5bit-quantized variants and lower should fit on a 8GB VRAM, but of course have lower quality
-      * for 7B parameter models, 6bit-quantized variants and up will need 12GB VRAM
+      * Some examples:
+
+         * for 8B parameter models, 5bit-quantized variants and lower should fit on a 8GB VRAM, but of course have slightly lower quality
+         * for 8B parameter models, 6bit-quantized variants and up will need 12GB VRAM
+         * for a 70B parameter model 4bit-quantized variants will need ~44GB VRAM
+
      * If you want better reasoning capabilities, you will need to look for models with more parameters, like 14B and higher, which of course also need more VRAM
+      * To check whether a model fits on your GPU, you can use this `calculator <https://rahulschand.github.io/gpu_poor/>`_

 * CPU Sizing

@@ -69,7 +76,8 @@ Nextcloud customers should file bugs directly with our Support system.
 Known Limitations
 -----------------

-* We currently only support the English language
+* We currently only support languages that the underlying model supports; correctness of language use in languages other than English may be poor depending on the language's coverage in the model's training data (We recommended model Llama 3 or other models explicitly trained on multiple languages)
+* Language models can be bad at reasoning tasks
 * Language models are likely to generate false information and should thus only be used in situations that are not critical. It's recommended to only use AI at the beginning of a creation process and not at the end, so that outputs of AI serve as a draft for example and not as final product. Always check the output of language models before using it.
 * Make sure to test the language model you are using it for whether it meets the use-case's quality requirements
 * Language models notoriously have a high energy consumption, if you want to reduce load on your server you can choose smaller models or quantized models in excahnge for lower accuracy
--- a/admin_manual/ai/app_stt_whisper2.rst
+++ b/admin_manual/ai/app_stt_whisper2.rst
@@ -6,10 +6,12 @@ App: Local Whisper Speech-To-Text (stt_whisper2)

 The *stt_whisper2* app is one of the apps that provide Speech-To-Text functionality in Nextcloud and act as a media transcription backend for the :ref:`Nextcloud Assistant app<ai-app-assistant>`, the *talk* app and :ref:`other apps making use of the core Translation API<stt-consumer-apps>`. The *stt_whisper2* app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.

+This app supports input and output in languages other than English if the underlying model supports the language.
+
 This app uses `faster-whisper <https://github.com/SYSTRAN/faster-whisper>`_ under the hood. Output quality will differ depending on which model you use, we recommend the following models:

- * OpenAI Whisper large-v2 or v3
- * OpenAI Whisper medium.en
+ * OpenAI Whisper large-v2 or v3 (multilingual)
+ * OpenAI Whisper medium.en (English only)

 Requirements
 ------------
@@ -44,7 +46,7 @@ Installation
 Supplying alternate models
 ~~~~~~~~~~~~~~~~~~~~~~~~~~

-This app allows supplying alternate LLM models as *gguf* files in the ``/nc_app_llm2_data`` directory of the docker container. You can use any `*faster-whisper* model by Systran on hugging face <https://huggingface.co/Systran>`_ by simply
+This app allows supplying alternate models in the ``/nc_app_llm2_data`` directory of the docker container. You can use any `*faster-whisper* model by Systran on hugging face <https://huggingface.co/Systran>`_ in the following way:

 1. git cloning the respective repository
 2. Copying the folder with the git repository to ``/nc_app_llm2_data`` inside the docker container.