From f9f34fc652e796f281fdfe8857747b549bc712c5 Mon Sep 17 00:00:00 2001 From: Marcel Klehr Date: Thu, 4 Jul 2024 10:40:41 +0200 Subject: [PATCH] docs(AI/LLM2): Update requirements and document model configuration Signed-off-by: Marcel Klehr --- admin_manual/ai/app_llm2.rst | 43 +++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/admin_manual/ai/app_llm2.rst b/admin_manual/ai/app_llm2.rst index 4c4b73210..87b5fe108 100644 --- a/admin_manual/ai/app_llm2.rst +++ b/admin_manual/ai/app_llm2.rst @@ -18,10 +18,15 @@ Requirements * This app is built as an External App and thus depends on AppAPI v2.3.0 or higher * Nextcloud AIO is supported -* Using GPU is currently not supported +* We currently support NVIDIA GPUs and x86_64 CPUs +* GPU Sizing + + * A NVIDIA GPU with at least 8GB VRAM + * At least 12GB of system RAM * CPU Sizing + * At least 12GB of system RAM * The more cores you have and the more powerful the CPU the better, we recommend 10-20 cores * The app will hog all cores by default, so it is usually better to run it on a separate machine @@ -42,6 +47,42 @@ This app allows supplying alternate LLM models as *gguf* files in the ``/nc_app_ 3. Restart the llm2 ExApp 4. Select the new model in the Nextcloud AI admin settings + +Configuring alternate models +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Since every model requires slightly different inference parameters, you can pass along a configuration file for the alternate model files you supply. + +The configuration file for a model file must have the same name as the model file but must end in ``.json`` instead of ``.gguf``. + +The strings ``{system_prompt}`` and ``{user_prompt}`` are variables that will be filled in by the app, so they must be part of your prompt template. + +Here is an example config file for Llama 2: + +.. code-block:: json + + { + "prompt": "<|im_start|> system\n{system_prompt}\n<|im_end|>\n<|im_start|> user\n{user_prompt}\n<|im_end|>\n<|im_start|> assistant\n", + "gpt4all_config": { + "max_tokens": 4096, + "n_predict": 2048, + "stop": ["<|im_end|>"] + } + } + +Here is an example configuration for Llama 3: + +.. code-block:: json + + { + "prompt": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{user_prompt}<|eot_id|>\n\”<|start_header_id|>assistant<|end_header_id|>\n\n", + "gpt4all_config": { + "max_tokens": 8000, + "n_predict": 4000, + "stop": ["<|eot_id|>"] + } + } + Scaling -------