Merge pull request #11907 from nextcloud/fix/ai/small-additions2

fix(admin/AI): Small additions
This commit is contained in:
Marcel Klehr
2024-06-24 12:44:12 +02:00
committed by GitHub
3 changed files with 22 additions and 10 deletions

View File

@@ -13,6 +13,8 @@ Together they provide the ContextChat text processing tasks accessible via the :
The *context_chat* and *context_chat_backend* apps run only open source models and do so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
This app supports input and output in languages other than English if the language model supports the language.
Requirements
------------

View File

@@ -4,17 +4,19 @@ App: Local large language model (llm2)
.. _ai-app-llm2:
The *llm2* app is one of the apps that provide text processing functionality using Large language models in Nextcloud and act as a text processing backend for the :ref:`Nextcloud Assistant app<ai-app-assistant>`, the *mail* app and :ref:`other apps making use of the core Translation API<tp-consumer-apps>`. The *llm2* app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
The *llm2* app is one of the apps that provide text processing functionality using Large language models in Nextcloud and act as a text processing backend for the :ref:`Nextcloud Assistant app<ai-app-assistant>`, the *mail* app and :ref:`other apps making use of the core Text Processing API<tp-consumer-apps>`. The *llm2* app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
This app uses `ctransformers <https://github.com/marella/ctransformers>`_ under the hood and is thus compatible with any model in *gguf* format. Output quality will differ depending on which model you use, we recommend the following models:
* `Llama2 7b Chat <https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF>`_ (Slightly older; good quality; good acclaim)
* `NeuralBeagle14 7B <https://huggingface.co/mlabonne/NeuralBeagle14-7B-GGUF>`_ (Newer; good quality; less well known)
* `Llama3 8b Instruct <https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF>`_ (reasonable quality; fast; good acclaim; multilingual output may not be optimal)
* `Llama3 70B Instruct <https://huggingface.co/QuantFactory/Meta-Llama-3-70B-Instruct-GGUF>`_ (good quality; good acclaim; good multilingual output)
This app supports input and output in languages other than English if the underlying model supports the language.
Requirements
------------
* This app is built as an External App and thus depends on AppAPI v2.3.0
* This app is built as an External App and thus depends on AppAPI v2.3.0 or higher
* Nextcloud AIO is supported
* Using GPU processing is supported, but not required; be prepared for slow performance unless you are using GPU
* We currently only support NVIDIA GPUs
@@ -22,9 +24,14 @@ Requirements
* You will need a GPU with enough VRAM to hold the model you choose
* for 7B parameter models, 5bit-quantized variants and lower should fit on a 8GB VRAM, but of course have lower quality
* for 7B parameter models, 6bit-quantized variants and up will need 12GB VRAM
* Some examples:
* for 8B parameter models, 5bit-quantized variants and lower should fit on a 8GB VRAM, but of course have slightly lower quality
* for 8B parameter models, 6bit-quantized variants and up will need 12GB VRAM
* for a 70B parameter model 4bit-quantized variants will need ~44GB VRAM
* If you want better reasoning capabilities, you will need to look for models with more parameters, like 14B and higher, which of course also need more VRAM
* To check whether a model fits on your GPU, you can use this `calculator <https://rahulschand.github.io/gpu_poor/>`_
* CPU Sizing
@@ -69,7 +76,8 @@ Nextcloud customers should file bugs directly with our Support system.
Known Limitations
-----------------
* We currently only support the English language
* We currently only support languages that the underlying model supports; correctness of language use in languages other than English may be poor depending on the language's coverage in the model's training data (We recommended model Llama 3 or other models explicitly trained on multiple languages)
* Language models can be bad at reasoning tasks
* Language models are likely to generate false information and should thus only be used in situations that are not critical. It's recommended to only use AI at the beginning of a creation process and not at the end, so that outputs of AI serve as a draft for example and not as final product. Always check the output of language models before using it.
* Make sure to test the language model you are using it for whether it meets the use-case's quality requirements
* Language models notoriously have a high energy consumption, if you want to reduce load on your server you can choose smaller models or quantized models in excahnge for lower accuracy

View File

@@ -6,10 +6,12 @@ App: Local Whisper Speech-To-Text (stt_whisper2)
The *stt_whisper2* app is one of the apps that provide Speech-To-Text functionality in Nextcloud and act as a media transcription backend for the :ref:`Nextcloud Assistant app<ai-app-assistant>`, the *talk* app and :ref:`other apps making use of the core Translation API<stt-consumer-apps>`. The *stt_whisper2* app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.
This app supports input and output in languages other than English if the underlying model supports the language.
This app uses `faster-whisper <https://github.com/SYSTRAN/faster-whisper>`_ under the hood. Output quality will differ depending on which model you use, we recommend the following models:
* OpenAI Whisper large-v2 or v3
* OpenAI Whisper medium.en
* OpenAI Whisper large-v2 or v3 (multilingual)
* OpenAI Whisper medium.en (English only)
Requirements
------------
@@ -44,7 +46,7 @@ Installation
Supplying alternate models
~~~~~~~~~~~~~~~~~~~~~~~~~~
This app allows supplying alternate LLM models as *gguf* files in the ``/nc_app_llm2_data`` directory of the docker container. You can use any `*faster-whisper* model by Systran on hugging face <https://huggingface.co/Systran>`_ by simply
This app allows supplying alternate models in the ``/nc_app_llm2_data`` directory of the docker container. You can use any `*faster-whisper* model by Systran on hugging face <https://huggingface.co/Systran>`_ in the following way:
1. git cloning the respective repository
2. Copying the folder with the git repository to ``/nc_app_llm2_data`` inside the docker container.