From 8a82b7341e361856a67dd69264962863f60283cf Mon Sep 17 00:00:00 2001
From: Marcel Klehr <mklehr@gmx.net>
Date: Thu, 13 Jun 2024 09:50:46 +0200
Subject: [PATCH 1/6] fix(admin/AI): Small additions

Signed-off-by: Marcel Klehr <mklehr@gmx.net>
---
 admin_manual/ai/app_context_chat.rst |  2 ++
 admin_manual/ai/app_llm2.rst         | 22 +++++++++++++++-------
 admin_manual/ai/app_stt_whisper2.rst |  8 +++++---
 3 files changed, 22 insertions(+), 10 deletions(-)
diff --git a/admin_manual/ai/app_context_chat.rst b/admin_manual/ai/app_context_chat.rst
index 53eeb8c2c..b3ccc5813 100644
--- a/admin_manual/ai/app_context_chat.rst
+++ b/admin_manual/ai/app_context_chat.rst
@@ -13,6 +13,8 @@ Together they provide the ContextChat text processing tasks accessible via the :
 
 The *context_chat* and *context_chat_backend* apps run only open source models and do so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
 
+This app supports input and output in other languages than English, if the language model supports the language.
+
 Requirements
 ------------
 
diff --git a/admin_manual/ai/app_llm2.rst b/admin_manual/ai/app_llm2.rst
index 2c78c11e4..e074d3545 100644
--- a/admin_manual/ai/app_llm2.rst
+++ b/admin_manual/ai/app_llm2.rst
@@ -4,17 +4,19 @@ App: Local large language model (llm2)
 
 .. _ai-app-llm2:
 
-The *llm2* app is one of the apps that provide text processing functionality using Large language models in Nextcloud and act as a text processing backend for the :ref:`Nextcloud Assistant app<ai-app-assistant>`, the *mail* app and :ref:`other apps making use of the core Translation API<tp-consumer-apps>`. The *llm2* app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
+The *llm2* app is one of the apps that provide text processing functionality using Large language models in Nextcloud and act as a text processing backend for the :ref:`Nextcloud Assistant app<ai-app-assistant>`, the *mail* app and :ref:`other apps making use of the core Text Processing API<tp-consumer-apps>`. The *llm2* app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
 
 This app uses `ctransformers <https://github.com/marella/ctransformers>`_ under the hood and is thus compatible with any model in *gguf* format. Output quality will differ depending on which model you use, we recommend the following models:
 
-* `Llama2 7b Chat <https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF>`_ (Slightly older; good quality; good acclaim)
-* `NeuralBeagle14 7B <https://huggingface.co/mlabonne/NeuralBeagle14-7B-GGUF>`_ (Newer; good quality; less well known)
+* `Llama3 8b Instruct <https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF>`_ (reasonable quality; fast; good acclaim; multilingual output may not be optimal)
+* `Llama3 70B Instruct <https://huggingface.co/QuantFactory/Meta-Llama-3-70B-Instruct-GGUF>`_ (good quality; good acclaim; good multilingual output)
+
+This app supports input and output in other languages than English, if the underlying model supports the language.
 
 Requirements
 ------------
 
-* This app is built as an External App and thus depends on AppAPI v2.3.0
+* This app is built as an External App and thus depends on AppAPI v2.3.0 or higher
 * Nextcloud AIO is supported
 * Using GPU processing is supported, but not required; be prepared for slow performance unless you are using GPU
 * We currently only support NVIDIA GPUs
@@ -22,9 +24,14 @@ Requirements
 
    * You will need a GPU with enough VRAM to hold the model you choose
 
-      * for 7B parameter models, 5bit-quantized variants and lower should fit on a 8GB VRAM, but of course have lower quality
-      * for 7B parameter models, 6bit-quantized variants and up will need 12GB VRAM
+      * here are some examples:
+
+         * for 8B parameter models, 5bit-quantized variants and lower should fit on a 8GB VRAM, but of course have slightly lower quality
+         * for 8B parameter models, 6bit-quantized variants and up will need 12GB VRAM
+         * for a 70B parameter model 4bit-quantized variants will need ~44GB VRAM
+
       * If you want better reasoning capabilities, you will need to look for models with more parameters, like 14B and higher, which of course also need more VRAM
+      * To check whether a model fits on your GPU, you can use this `calculator <https://rahulschand.github.io/gpu_poor/>`_
 
 * CPU Sizing
 
@@ -69,7 +76,8 @@ Nextcloud customers should file bugs directly with our Support system.
 Known Limitations
 -----------------
 
-* We currently only support the English language
+* We currently only support languages that the underlying model supports; correctness of language use in other languages than English may be poor depending on the language's coverage in the model's training data (We recommended model Llama 3 or other models explicitly trained on multiple languages)
+* Language models can be bad at reasoning tasks
 * Language models are likely to generate false information and should thus only be used in situations that are not critical. It's recommended to only use AI at the beginning of a creation process and not at the end, so that outputs of AI serve as a draft for example and not as final product. Always check the output of language models before using it.
 * Make sure to test the language model you are using it for whether it meets the use-case's quality requirements
 * Language models notoriously have a high energy consumption, if you want to reduce load on your server you can choose smaller models or quantized models in excahnge for lower accuracy
diff --git a/admin_manual/ai/app_stt_whisper2.rst b/admin_manual/ai/app_stt_whisper2.rst
index 4ef327dd0..0b901a4d2 100644
--- a/admin_manual/ai/app_stt_whisper2.rst
+++ b/admin_manual/ai/app_stt_whisper2.rst
@@ -6,10 +6,12 @@ App: Local Whisper Speech-To-Text (stt_whisper2)
 
 The *stt_whisper2* app is one of the apps that provide Speech-To-Text functionality in Nextcloud and act as a media transcription backend for the :ref:`Nextcloud Assistant app<ai-app-assistant>`, the *talk* app and :ref:`other apps making use of the core Translation API<stt-consumer-apps>`. The *stt_whisper2* app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
 
+This app supports input and output in other languages than English, if the underlying model supports the language.
+
 This app uses `faster-whisper <https://github.com/SYSTRAN/faster-whisper>`_ under the hood. Output quality will differ depending on which model you use, we recommend the following models:
 
- * OpenAI Whisper large-v2 or v3
- * OpenAI Whisper medium.en
+ * OpenAI Whisper large-v2 or v3 (multilingual)
+ * OpenAI Whisper medium.en (only supports English)
 
 Requirements
 ------------
@@ -44,7 +46,7 @@ Installation
 Supplying alternate models
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-This app allows supplying alternate LLM models as *gguf* files in the ``/nc_app_llm2_data`` directory of the docker container. You can use any `*faster-whisper* model by Systran on hugging face <https://huggingface.co/Systran>`_ by simply
+This app allows supplying alternate models in the ``/nc_app_llm2_data`` directory of the docker container. You can use any `*faster-whisper* model by Systran on hugging face <https://huggingface.co/Systran>`_ by simply
 
 1. git cloning the respective repository
 2. Copying the folder with the git repository to ``/nc_app_llm2_data`` inside the docker container.

From 8d3184980990245c3d7db3e8842eb2887942a6a5 Mon Sep 17 00:00:00 2001
From: Marcel Klehr <mklehr@gmx.net>
Date: Mon, 24 Jun 2024 12:39:21 +0200
Subject: [PATCH 2/6] Update admin_manual/ai/app_stt_whisper2.rst

Co-authored-by: Josh <josh.t.richards@gmail.com>
Signed-off-by: Marcel Klehr <mklehr@gmx.net>
---
 admin_manual/ai/app_stt_whisper2.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/admin_manual/ai/app_stt_whisper2.rst b/admin_manual/ai/app_stt_whisper2.rst
index 0b901a4d2..5b51d1fc6 100644
--- a/admin_manual/ai/app_stt_whisper2.rst
+++ b/admin_manual/ai/app_stt_whisper2.rst
@@ -46,7 +46,7 @@ Installation
 Supplying alternate models
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-This app allows supplying alternate models in the ``/nc_app_llm2_data`` directory of the docker container. You can use any `*faster-whisper* model by Systran on hugging face <https://huggingface.co/Systran>`_ by simply
+This app allows supplying alternate models in the ``/nc_app_llm2_data`` directory of the docker container. You can use any `*faster-whisper* model by Systran on hugging face <https://huggingface.co/Systran>`_ in the following way:
 
 1. git cloning the respective repository
 2. Copying the folder with the git repository to ``/nc_app_llm2_data`` inside the docker container.

From 5c881f087ce83f9c493d47ea2c313872520bfe26 Mon Sep 17 00:00:00 2001
From: Marcel Klehr <mklehr@gmx.net>
Date: Mon, 24 Jun 2024 12:39:31 +0200
Subject: [PATCH 3/6] Update admin_manual/ai/app_stt_whisper2.rst

Co-authored-by: Josh <josh.t.richards@gmail.com>
Signed-off-by: Marcel Klehr <mklehr@gmx.net>
---
 admin_manual/ai/app_stt_whisper2.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/admin_manual/ai/app_stt_whisper2.rst b/admin_manual/ai/app_stt_whisper2.rst
index 5b51d1fc6..1557036ab 100644
--- a/admin_manual/ai/app_stt_whisper2.rst
+++ b/admin_manual/ai/app_stt_whisper2.rst
@@ -11,7 +11,7 @@ This app supports input and output in other languages than English, if the under
 This app uses `faster-whisper <https://github.com/SYSTRAN/faster-whisper>`_ under the hood. Output quality will differ depending on which model you use, we recommend the following models:
 
  * OpenAI Whisper large-v2 or v3 (multilingual)
- * OpenAI Whisper medium.en (only supports English)
+ * OpenAI Whisper medium.en (English only)
 
 Requirements
 ------------

From 2995d98bf375da6947f59b832224fe85d3a80b83 Mon Sep 17 00:00:00 2001
From: Marcel Klehr <mklehr@gmx.net>
Date: Mon, 24 Jun 2024 12:39:41 +0200
Subject: [PATCH 4/6] Update admin_manual/ai/app_llm2.rst

Co-authored-by: Josh <josh.t.richards@gmail.com>
Signed-off-by: Marcel Klehr <mklehr@gmx.net>
---
 admin_manual/ai/app_llm2.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/admin_manual/ai/app_llm2.rst b/admin_manual/ai/app_llm2.rst
index e074d3545..67b4d68d6 100644
--- a/admin_manual/ai/app_llm2.rst
+++ b/admin_manual/ai/app_llm2.rst
@@ -24,7 +24,7 @@ Requirements
 
    * You will need a GPU with enough VRAM to hold the model you choose
 
-      * here are some examples:
+      * Some examples:
 
          * for 8B parameter models, 5bit-quantized variants and lower should fit on a 8GB VRAM, but of course have slightly lower quality
          * for 8B parameter models, 6bit-quantized variants and up will need 12GB VRAM

From 692751789993b77b6778e076910aa339df8473ce Mon Sep 17 00:00:00 2001
From: Marcel Klehr <mklehr@gmx.net>
Date: Mon, 24 Jun 2024 12:39:58 +0200
Subject: [PATCH 5/6] Update admin_manual/ai/app_context_chat.rst

Co-authored-by: Josh <josh.t.richards@gmail.com>
Signed-off-by: Marcel Klehr <mklehr@gmx.net>
---
 admin_manual/ai/app_context_chat.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/admin_manual/ai/app_context_chat.rst b/admin_manual/ai/app_context_chat.rst
index b3ccc5813..97e9d3367 100644
--- a/admin_manual/ai/app_context_chat.rst
+++ b/admin_manual/ai/app_context_chat.rst
@@ -13,7 +13,7 @@ Together they provide the ContextChat text processing tasks accessible via the :
 
 The *context_chat* and *context_chat_backend* apps run only open source models and do so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
 
-This app supports input and output in other languages than English, if the language model supports the language.
+This app supports input and output in languages other than English if the language model supports the language.
 
 Requirements
 ------------

From fe598387c347f880b7394aba93de5e35299ddbde Mon Sep 17 00:00:00 2001
From: Marcel Klehr <mklehr@gmx.net>
Date: Mon, 24 Jun 2024 12:40:25 +0200
Subject: [PATCH 6/6] Apply suggestions from code review

Co-authored-by: Josh <josh.t.richards@gmail.com>
Signed-off-by: Marcel Klehr <mklehr@gmx.net>
---
 admin_manual/ai/app_llm2.rst         | 4 ++--
 admin_manual/ai/app_stt_whisper2.rst | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/admin_manual/ai/app_llm2.rst b/admin_manual/ai/app_llm2.rst
index 67b4d68d6..28131a1dc 100644
--- a/admin_manual/ai/app_llm2.rst
+++ b/admin_manual/ai/app_llm2.rst
@@ -11,7 +11,7 @@ This app uses `ctransformers <https://github.com/marella/ctransformers>`_ under
 * `Llama3 8b Instruct <https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF>`_ (reasonable quality; fast; good acclaim; multilingual output may not be optimal)
 * `Llama3 70B Instruct <https://huggingface.co/QuantFactory/Meta-Llama-3-70B-Instruct-GGUF>`_ (good quality; good acclaim; good multilingual output)
 
-This app supports input and output in other languages than English, if the underlying model supports the language.
+This app supports input and output in languages other than English if the underlying model supports the language.
 
 Requirements
 ------------
@@ -76,7 +76,7 @@ Nextcloud customers should file bugs directly with our Support system.
 Known Limitations
 -----------------
 
-* We currently only support languages that the underlying model supports; correctness of language use in other languages than English may be poor depending on the language's coverage in the model's training data (We recommended model Llama 3 or other models explicitly trained on multiple languages)
+* We currently only support languages that the underlying model supports; correctness of language use in languages other than English may be poor depending on the language's coverage in the model's training data (We recommended model Llama 3 or other models explicitly trained on multiple languages)
 * Language models can be bad at reasoning tasks
 * Language models are likely to generate false information and should thus only be used in situations that are not critical. It's recommended to only use AI at the beginning of a creation process and not at the end, so that outputs of AI serve as a draft for example and not as final product. Always check the output of language models before using it.
 * Make sure to test the language model you are using it for whether it meets the use-case's quality requirements
diff --git a/admin_manual/ai/app_stt_whisper2.rst b/admin_manual/ai/app_stt_whisper2.rst
index 1557036ab..0fddadad5 100644
--- a/admin_manual/ai/app_stt_whisper2.rst
+++ b/admin_manual/ai/app_stt_whisper2.rst
@@ -6,7 +6,7 @@ App: Local Whisper Speech-To-Text (stt_whisper2)
 
 The *stt_whisper2* app is one of the apps that provide Speech-To-Text functionality in Nextcloud and act as a media transcription backend for the :ref:`Nextcloud Assistant app<ai-app-assistant>`, the *talk* app and :ref:`other apps making use of the core Translation API<stt-consumer-apps>`. The *stt_whisper2* app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
 
-This app supports input and output in other languages than English, if the underlying model supports the language.
+This app supports input and output in languages other than English if the underlying model supports the language.
 
 This app uses `faster-whisper <https://github.com/SYSTRAN/faster-whisper>`_ under the hood. Output quality will differ depending on which model you use, we recommend the following models: