feat(exapp_development): add Kubernetes setup instructions

Signed-off-by: Edward Ly <contact@edward.ly>
2026-03-26 13:28:45 +07:00 · 2026-03-11 17:41:14 +01:00
parent 241b4c48c9
commit f986f22c93
7 changed files with 1331 additions and 23 deletions
--- a/developer_manual/exapp_development/faq/Scaling.rst
+++ b/developer_manual/exapp_development/faq/Scaling.rst
@@ -1,22 +0,0 @@
-Scaling
-=======
-
-AppAPI delegates the scaling task to the ExApp itself.
-This means that the ExApp must be designed in a way to be able to scale vertically.
-As for the horizontal scaling, it is currently not possible except by using,
-for example, a Server-Worker architecture, which is a good way to support basic scaling capabilities.
-In this case, the Server is your ExApp and the Workers are the external machines that can work with the ExApp
-using Nextcloud user authentication.
-Additional clients (or workers) can be (optionally) added (or attached) to the ExApp
-to increase the capacity and performance.
-
-
-GPUs scaling
------------
-
-Currently, if a Deploy daemon configured with GPUs available,
-AppAPI by default will attach all available GPU devices to each ExApp container on this Deploy daemon.
-This means that these GPUs are shared between all ExApps on the same Deploy daemon.
-Therefore, for the ExApps that require heavy use of GPUs,
-it is recommended to have a separate Deploy daemon (host) for them.
-
--- a/developer_manual/exapp_development/faq/index.rst
+++ b/developer_manual/exapp_development/faq/index.rst
@@ -19,6 +19,5 @@ or provide a brief answer.
 	DockerContainerRegistry
 	DockerSocketProxy
 	GpuSupport
-	Scaling
 	BehindCompanyProxy
 	Troubleshooting
--- a/developer_manual/exapp_development/index.rst
+++ b/developer_manual/exapp_development/index.rst
@@ -7,6 +7,7 @@ ExApp development

   Introduction
   DevSetup
+   scaling/index.rst
   development_overview/index.rst
   tech_details/index.rst
   faq/index.rst
--- a/developer_manual/exapp_development/scaling/AppAPIEmulation.rst
+++ b/developer_manual/exapp_development/scaling/AppAPIEmulation.rst
@@ -0,0 +1,366 @@
+Emulating AppAPI
+================
+
+This section documents the ``curl`` commands used to emulate AppAPI when
+testing HaRP’s Kubernetes backend.
+
+Prerequisites
+-------------
+
+-  HaRP is reachable at: ``http://nextcloud.local/exapps``
+-  HaRP was started with the same shared key as used below
+   (``HP_SHARED_KEY``)
+-  HaRP has Kubernetes backend enabled (``HP_K8S_ENABLED=true``) and can
+   access the k8s API
+-  ``kubectl`` is configured to point to the same cluster HaRP uses
+-  Optional: ``jq`` for parsing JSON responses
+
+--------------
+
+0. Environment variables
+------------------------
+
+.. code:: bash
+
+   # .env
+   EXAPPS_URL="http://nextcloud.local/exapps"
+   APPAPI_URL="${EXAPPS_URL}/app_api"
+   HP_SHARED_KEY="some_very_secure_password"
+
+   # Optional: Nextcloud base (only used by ExApp container env in this guide)
+   NEXTCLOUD_URL="http://nextcloud.local"
+
+.. code:: bash
+
+   source .env
+
+.. note::
+
+   All AppAPI-emulation calls go to ``$APPAPI_URL/...`` and require the header ``harp-shared-key``.
+
+.. note::
+
+   You can also hit the agent directly on
+   ``http://127.0.0.1:8200/...`` for debugging, but that bypasses the
+   HAProxy/AppAPI path and may skip shared-key enforcement depending
+   on your routing.
+
+1. Check if ExApp is present (k8s deployment exists)
+----------------------------------------------------
+
+.. code:: bash
+
+   curl -sS \
+     -H "harp-shared-key: $HP_SHARED_KEY" \
+     -H "Content-Type: application/json" \
+     -X POST \
+     -d '{
+       "name": "test-deploy",
+       "instance_id": ""
+     }' \
+     "$APPAPI_URL/k8s/exapp/exists"
+
+Expected output:
+
+.. code:: json
+
+   {"exists": true}
+
+or
+
+.. code:: json
+
+   {"exists": false}
+
+2. Create ExApp (PVC + Deployment with replicas=0)
+--------------------------------------------------
+
+.. code:: bash
+
+   curl -sS \
+     -H "harp-shared-key: $HP_SHARED_KEY" \
+     -H "Content-Type: application/json" \
+     -X POST \
+     -d '{
+       "name": "test-deploy",
+       "instance_id": "",
+       "image": "ghcr.io/nextcloud/test-deploy:latest",
+       "environment_variables": [
+         "APP_ID=test-deploy",
+         "APP_DISPLAY_NAME=Test Deploy",
+         "APP_VERSION=1.2.1",
+         "APP_HOST=0.0.0.0",
+         "APP_PORT=23000",
+         "NEXTCLOUD_URL='"$NEXTCLOUD_URL"'",
+         "APP_SECRET=some-dev-secret",
+         "APP_PERSISTENT_STORAGE=/nc_app_test-deploy_data"
+       ],
+       "resource_limits": { "cpu": "500m", "memory": "512Mi" }
+     }' \
+     "$APPAPI_URL/k8s/exapp/create"
+
+Expected output (example):
+
+.. code:: json
+
+   {"name":"nc-app-test-deploy"}
+
+3. Start ExApp (scale replicas to 1)
+------------------------------------
+
+.. code:: bash
+
+   curl -sS \
+     -H "harp-shared-key: $HP_SHARED_KEY" \
+     -H "Content-Type: application/json" \
+     -X POST \
+     -d '{
+       "name": "test-deploy",
+       "instance_id": ""
+     }' \
+     "$APPAPI_URL/k8s/exapp/start"
+
+Expected: HTTP 204.
+
+4. Wait for ExApp to become Ready
+---------------------------------
+
+.. code:: bash
+
+   curl -sS \
+     -H "harp-shared-key: $HP_SHARED_KEY" \
+     -H "Content-Type: application/json" \
+     -X POST \
+     -d '{
+       "name": "test-deploy",
+       "instance_id": ""
+     }' \
+     "$APPAPI_URL/k8s/exapp/wait_for_start"
+
+Expected output (example):
+
+.. code:: json
+
+   {
+     "started": true,
+     "status": "running",
+     "health": "ready",
+     "reason": null,
+     "message": null
+   }
+
+5. Expose + register in HaRP
+----------------------------
+
+5.1 NodePort (default behavior)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Minimal (uses defaults, may auto-pick a node address):**
+
+.. code:: bash
+
+   EXPOSE_JSON=$(
+     curl -sS \
+       -H "harp-shared-key: $HP_SHARED_KEY" \
+       -H "Content-Type: application/json" \
+       -X POST \
+       -d '{
+         "name": "test-deploy",
+         "instance_id": "",
+         "port": 23000,
+         "expose_type": "nodeport"
+       }' \
+       "$APPAPI_URL/k8s/exapp/expose"
+   )
+
+   echo "$EXPOSE_JSON"
+
+**Recommended (provide a stable host reachable by HaRP):**
+
+.. code:: bash
+
+   # Example: edge node IP / VIP / L4 LB that forwards NodePort range
+   UPSTREAM_HOST="172.18.0.2"
+
+   EXPOSE_JSON=$(
+     curl -sS \
+       -H "harp-shared-key: $HP_SHARED_KEY" \
+       -H "Content-Type: application/json" \
+       -X POST \
+       -d '{
+         "name": "test-deploy",
+         "instance_id": "",
+         "port": 23000,
+         "expose_type": "nodeport",
+         "upstream_host": "'"$UPSTREAM_HOST"'"
+       }' \
+       "$APPAPI_URL/k8s/exapp/expose"
+   )
+
+   echo "$EXPOSE_JSON"
+
+5.2 ClusterIP (only if HaRP can reach ClusterIP + resolve service DNS)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   EXPOSE_JSON=$(
+     curl -sS \
+       -H "harp-shared-key: $HP_SHARED_KEY" \
+       -H "Content-Type: application/json" \
+       -X POST \
+       -d '{
+         "name": "test-deploy",
+         "instance_id": "",
+         "port": 23000,
+         "expose_type": "clusterip"
+       }' \
+       "$APPAPI_URL/k8s/exapp/expose"
+   )
+
+   echo "$EXPOSE_JSON"
+
+5.3 Manual (HaRP does not create or inspect any Service)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   EXPOSE_JSON=$(
+     curl -sS \
+       -H "harp-shared-key: $HP_SHARED_KEY" \
+       -H "Content-Type: application/json" \
+       -X POST \
+       -d '{
+         "name": "test-deploy",
+         "instance_id": "",
+         "port": 23000,
+         "expose_type": "manual",
+         "upstream_host": "exapp-test-deploy.internal",
+         "upstream_port": 23000
+       }' \
+       "$APPAPI_URL/k8s/exapp/expose"
+   )
+
+   echo "$EXPOSE_JSON"
+
+6. Extract exposed host/port for follow-up tests (requires ``jq``)
+------------------------------------------------------------------
+
+.. code:: bash
+
+   EXAPP_HOST=$(echo "$EXPOSE_JSON" | jq -r '.host')
+   EXAPP_PORT=$(echo "$EXPOSE_JSON" | jq -r '.port')
+
+   echo "ExApp upstream endpoint: ${EXAPP_HOST}:${EXAPP_PORT}"
+
+7. Check ``/heartbeat`` via HaRP routing (AppAPI-style direct routing headers)
+------------------------------------------------------------------------------
+
+This checks HaRP’s ability to route to the ExApp given an explicit
+upstream host/port and AppAPI-style authorization header.
+
+7.1 Build ``authorization-app-api`` value
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+HaRP typically expects this value to be the **base64-encoded value of**
+``user_id:APP_SECRET`` (similar to HTTP Basic without the ``Basic``
+prefix). For an “anonymous” style request, use ``:APP_SECRET``.
+
+.. code:: bash
+
+   # Option A: anonymous-style
+   AUTH_APP_API=$(printf '%s' ':some-dev-secret' | base64 | tr -d '\n')
+
+   # Option B: user-scoped style (example user "admin")
+   # AUTH_APP_API=$(printf '%s' 'admin:some-dev-secret' | base64 | tr -d '\n')
+
+7.2 Call heartbeat
+~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   curl -sS \
+     "http://nextcloud.local/exapps/test-deploy/heartbeat" \
+     -H "harp-shared-key: $HP_SHARED_KEY" \
+     -H "ex-app-version: 1.2.1" \
+     -H "ex-app-id: test-deploy" \
+     -H "ex-app-host: $EXAPP_HOST" \
+     -H "ex-app-port: $EXAPP_PORT" \
+     -H "authorization-app-api: $AUTH_APP_API"
+
+If this fails with auth-related errors, verify:
+
+-  ``APP_SECRET`` in the ExApp matches what you used here,
+-  your HaProxy config expectations for ``authorization-app-api`` (raw
+   vs base64).
+
+8. Stop and remove (API-based cleanup)
+--------------------------------------
+
+Stop ExApp (scale replicas to 0)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   curl -sS \
+     -H "harp-shared-key: $HP_SHARED_KEY" \
+     -H "Content-Type: application/json" \
+     -X POST \
+     -d '{
+       "name": "test-deploy",
+       "instance_id": ""
+     }' \
+     "$APPAPI_URL/k8s/exapp/stop"
+
+Remove ExApp (Deployment + optional PVC; Service may be removed depending on HaRP version)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   curl -sS \
+     -H "harp-shared-key: $HP_SHARED_KEY" \
+     -H "Content-Type: application/json" \
+     -X POST \
+     -d '{
+       "name": "test-deploy",
+       "instance_id": "",
+       "remove_data": true
+     }' \
+     "$APPAPI_URL/k8s/exapp/remove"
+
+--------------
+
+Useful ``kubectl`` commands (debug / manual cleanup)
+----------------------------------------------------
+
+Check resources
+~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl get deploy,svc,pvc -n nextcloud-exapps -o wide | grep -E 'test-deploy|NAME' || true
+   kubectl get pods -n nextcloud-exapps -o wide
+
+Delete Service (if it was exposed and needs manual cleanup)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl delete svc nc-app-test-deploy -n nextcloud-exapps
+
+Delete Deployment
+~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl delete deployment nc-app-test-deploy -n nextcloud-exapps
+
+Delete PVC (data)
+~~~~~~~~~~~~~~~~~
+
+PVC name is derived from ``nc_app_test-deploy_data`` and sanitized for
+k8s, typically: ``nc-app-test-deploy-data``
+
+.. code:: bash
+
+   kubectl delete pvc nc-app-test-deploy-data -n nextcloud-exapps
--- a/developer_manual/exapp_development/scaling/KEDASetup.rst
+++ b/developer_manual/exapp_development/scaling/KEDASetup.rst
@@ -0,0 +1,521 @@
+Autoscaling with KEDA
+=====================
+
+This section explains how to set up `KEDA <https://keda.sh/>`__ to auto-scale ExApp pods
+(using the `llm2 <https://docs.nextcloud.com/server/latest/admin_manual/ai/app_llm2.html>`_ app as an example)
+based on the Nextcloud TaskProcessing queue depth.
+
+Prerequisites
+-------------
+
+-  A working Nextcloud + HaRP + k8s setup (see
+   :ref:`scaling-kubernetes-setup`)
+-  An ExApp deployed and running (e.g. ``llm2`` with deployment name
+   ``nc-app-llm2``)
+-  ``kubectl`` configured and pointing to the cluster
+-  ``helm`` installed (`install
+   guide <https://helm.sh/docs/intro/install/>`__)
+-  For GPU ExApps: the daemon must be registered with
+   ``--compute_device=cuda``
+
+Architecture overview
+---------------------
+
+.. mermaid::
+
+   graph TB
+      Users[Users submit tasks] --> Nextcloud["Nextcloud TaskProcessing Queue
+         (scheduled + running tasks)"]
+      Nextcloud -->|"GET /ocs/v2.php/taskprocessing/queue_stats
+         Auth: Basic (admin app_password)"| KEDA["KEDA (metrics-api-server in k8s)"]
+      KEDA -->|"polls every pollingInterval (e.g. 15s)
+         scaling deployment based on queue depth"| deployment["nc-app-llm2 deployment (1..N pods)
+         Each pod independently calls next_task()"]
+
+KEDA uses a ``metrics`` trigger (HTTP polling) to query Nextcloud
+``queue_stats`` endpoint.
+When the queue grows, KEDA scales up the ExApp deployment.
+When the queue reduces in size, KEDA scales back down.
+
+--------------
+
+0. GPU Setup (kind cluster)
+---------------------------
+
+If your ExApp needs GPU (e.g. llm2), you must set up GPU passthrough in
+the kind cluster.
+
+0.1 Configure Docker on the host
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled
+   sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
+   sudo systemctl restart docker
+
+0.2 Create kind cluster with GPU support
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: yaml
+
+   # kind-gpu-config.yaml
+   kind: Cluster
+   apiVersion: kind.x-k8s.io/v1alpha4
+   nodes:
+     - role: control-plane
+       extraMounts:
+         - hostPath: /dev/null
+           containerPath: /var/run/nvidia-container-devices/all
+
+.. code:: bash
+
+   kind create cluster --name nc-exapps --config kind-gpu-config.yaml
+
+0.3 Install nvidia-container-toolkit inside the kind node
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   docker exec nc-exapps-control-plane bash -c '
+   apt-get update -y && apt-get install -y gnupg2 curl &&
+   curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
+     gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg &&
+   curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
+     sed "s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g" \
+     > /etc/apt/sources.list.d/nvidia-container-toolkit.list &&
+   apt-get update && apt-get install -y nvidia-container-toolkit
+   '
+
+0.4 Configure containerd and restart
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   docker exec nc-exapps-control-plane bash -c '
+   nvidia-ctk runtime configure --runtime=containerd --set-as-default &&
+   systemctl restart containerd
+   '
+
+0.5 Install NVIDIA device plugin
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For a single GPU shared across multiple pods, use **time-slicing**.
+First create a ConfigMap with the number of replicas (virtual GPUs):
+
+.. code:: bash
+
+   cat <<'EOF' | kubectl apply -f -
+   apiVersion: v1
+   kind: ConfigMap
+   metadata:
+     name: nvidia-device-plugin-config
+     namespace: kube-system
+   data:
+     config.yaml: |
+       version: v1
+       sharing:
+         timeSlicing:
+           renameByDefault: false
+           resources:
+             - name: nvidia.com/gpu
+               replicas: 4
+   EOF
+
+Then deploy the device plugin with the config:
+
+.. code:: bash
+
+   cat <<'EOF' | kubectl apply -f -
+   apiVersion: apps/v1
+   kind: DaemonSet
+   metadata:
+     name: nvidia-device-plugin-daemonset
+     namespace: kube-system
+   spec:
+     selector:
+       matchLabels:
+         name: nvidia-device-plugin-ds
+     template:
+       metadata:
+         labels:
+           name: nvidia-device-plugin-ds
+       spec:
+         tolerations:
+           - key: nvidia.com/gpu
+             operator: Exists
+             effect: NoSchedule
+         priorityClassName: system-node-critical
+         containers:
+           - image: nvcr.io/nvidia/k8s-device-plugin:v0.17.0
+             name: nvidia-device-plugin-ctr
+             args: ["--config-file=/config/config.yaml"]
+             env:
+               - name: FAIL_ON_INIT_ERROR
+                 value: "false"
+             securityContext:
+               allowPrivilegeEscalation: false
+               capabilities:
+                 drop: ["ALL"]
+             volumeMounts:
+               - name: device-plugin
+                 mountPath: /var/lib/kubelet/device-plugins
+               - name: plugin-config
+                 mountPath: /config
+         volumes:
+           - name: device-plugin
+             hostPath:
+               path: /var/lib/kubelet/device-plugins
+           - name: plugin-config
+             configMap:
+               name: nvidia-device-plugin-config
+               items:
+                 - key: config.yaml
+                   path: config.yaml
+   EOF
+
+0.6 Verify GPU is visible
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl get nodes -o json | python3 -c "
+   import json,sys
+   for n in json.load(sys.stdin)['items']:
+       gpu = n['status']['capacity'].get('nvidia.com/gpu','N/A')
+       print(f'{n[\"metadata\"][\"name\"]}: nvidia.com/gpu = {gpu}')
+   "
+
+Expected: ``nvidia.com/gpu = 4`` (or your configured replicas count).
+
+0.7 Test GPU from a pod
+~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl run gpu-test --image=nvidia/cuda:12.6.3-base-ubuntu24.04 --restart=Never \
+     --overrides='{"spec":{"containers":[{"name":"gpu-test","image":"nvidia/cuda:12.6.3-base-ubuntu24.04","command":["nvidia-smi"],"resources":{"limits":{"nvidia.com/gpu":"1"}}}]}}' \
+     -n nextcloud-exapps
+   sleep 30 && kubectl logs gpu-test -n nextcloud-exapps
+   kubectl delete pod gpu-test -n nextcloud-exapps
+
+--------------
+
+1. Install KEDA
+---------------
+
+.. code:: bash
+
+   helm repo add kedacore https://kedacore.github.io/charts
+   helm repo update
+   helm install keda kedacore/keda --namespace keda --create-namespace
+
+Verify:
+
+.. code:: bash
+
+   kubectl get pods -n keda
+   # All pods should be Running
+
+2. DNS setup (kind only)
+------------------------
+
+KEDA pods need to resolve ``nextcloud.local``. **HaRP does this
+automatically now** — when ``HP_K8S_HOST_ALIASES`` is set, HaRP patches
+the CoreDNS ``ConfigMap`` on startup and restarts CoreDNS so that every
+pod in the cluster (including KEDA) can resolve the configured
+hostnames.
+
+If you need to do it manually (or verify), the commands are:
+
+.. code:: bash
+
+   # Get the nginx proxy IP
+   PROXY_IP=$(docker inspect master-proxy-1 \
+     --format '{{(index .NetworkSettings.Networks "master_default").IPAddress}}')
+   echo "Proxy IP: $PROXY_IP"
+
+   # Write the Corefile with the correct IP
+   cat > /tmp/Corefile << EOF
+   .:53 {
+       errors
+       health {
+          lameduck 5s
+       }
+       ready
+       kubernetes cluster.local in-addr.arpa ip6.arpa {
+          pods insecure
+          fallthrough in-addr.arpa ip6.arpa
+          ttl 30
+       }
+       prometheus :9153
+       hosts {
+              ${PROXY_IP} nextcloud.local
+              fallthrough
+           }
+           forward . /etc/resolv.conf {
+          max_concurrent 1000
+       }
+       cache 30
+       loop
+       reload
+       loadbalance
+   }
+   EOF
+
+   kubectl create configmap coredns -n kube-system \
+     --from-file=Corefile=/tmp/Corefile \
+     --dry-run=client -o yaml | kubectl apply -f -
+
+   kubectl rollout restart deployment coredns -n kube-system
+
+Verify:
+
+.. code:: bash
+
+   kubectl run dns-test --rm -i --restart=Never --image=busybox -- nslookup nextcloud.local
+
+3. Create a Nextcloud App Password
+----------------------------------
+
+KEDA needs credentials to poll the ``queue_stats`` endpoint. The
+endpoint is admin-only.
+
+1. Log in to Nextcloud as admin
+2. Go to **Settings > Security > Devices & sessions**
+3. Enter a name (e.g. ``keda-scaler``) and click **Create new app
+   password**
+4. Copy the password into a **.env** file
+
+.. code:: bash
+
+   # .env
+   NC_USER="admin"
+   NC_APP_PASSWORD="<the-app-password-you-created>"
+   NC_URL="https://nextcloud.local"
+
+Verify:
+
+.. code:: bash
+
+   source .env
+   curl -s -k -u "${NC_USER}:${NC_APP_PASSWORD}" \
+     "${NC_URL}/ocs/v2.php/taskprocessing/queue_stats?format=json"
+
+Expected:
+
+.. code:: json
+
+   {"ocs":{"meta":{"status":"ok","statuscode":200,"message":"OK"},"data":{"scheduled":0,"running":0}}}
+
+4. Create k8s secret
+--------------------
+
+.. code:: bash
+
+   kubectl create secret generic nextcloud-keda-auth \
+     --namespace=nextcloud-exapps \
+     --from-literal=username="${NC_USER}" \
+     --from-literal=password="${NC_APP_PASSWORD}"
+
+5. Create KEDA TriggerAuthentication
+------------------------------------
+
+.. code:: bash
+
+   cat <<'EOF' | kubectl apply -f -
+   apiVersion: keda.sh/v1alpha1
+   kind: TriggerAuthentication
+   metadata:
+     name: nextcloud-auth
+     namespace: nextcloud-exapps
+   spec:
+     secretTargetRef:
+       - parameter: username
+         name: nextcloud-keda-auth
+         key: username
+       - parameter: password
+         name: nextcloud-keda-auth
+         key: password
+   EOF
+
+6. Create KEDA ScaledObject
+---------------------------
+
+.. note::
+
+   Nextcloud OCS returns XML by default. Always include ``format=json`` in the URL.
+
+Task type filter
+~~~~~~~~~~~~~~~~
+
+llm2 registers many task types. Use a comma-separated list to scale on
+all of them:
+
+::
+
+   ?taskTypeId=core:text2text,core:text2text:chat,core:text2text:summary,core:text2text:headline,core:text2text:topics,core:text2text:simplification,core:text2text:reformulation,core:contextwrite,core:text2text:changetone,core:text2text:chatwithtools,core:text2text:proofread
+
+Apply
+~~~~~
+
+.. code:: yaml
+
+   # keda-llm2-scaler.yaml
+   apiVersion: keda.sh/v1alpha1
+   kind: ScaledObject
+   metadata:
+     name: llm2-scaler
+     namespace: nextcloud-exapps
+   spec:
+     scaleTargetRef:
+       name: nc-app-llm2
+     pollingInterval: 15
+     cooldownPeriod: 120
+     initialCooldownPeriod: 60
+     minReplicaCount: 1
+     maxReplicaCount: 4
+     triggers:
+       - type: metrics-api
+         metadata:
+           url: "https://nextcloud.local/ocs/v2.php/taskprocessing/queue_stats?format=json&taskTypeId=core:text2text,core:text2text:chat,core:text2text:summary"
+           valueLocation: "ocs.data.scheduled"
+           targetValue: "5"
+           authMode: "basic"
+           unsafeSsl: "true"
+         authenticationRef:
+           name: nextcloud-auth
+
+.. code:: bash
+
+   kubectl apply -f keda-llm2-scaler.yaml
+
+Scaling formula
+~~~~~~~~~~~~~~~
+
+::
+
+   desiredReplicas = ceil( metricValue / targetValue )
+
+=============== ============= ===================
+Scheduled tasks targetValue=5 Result
+=============== ============= ===================
+0               \-            1 (minReplicaCount)
+3               ceil(3/5)=1   1 pod
+12              ceil(12/5)=3  3 pods
+50              ceil(50/5)=10 4 (capped at max)
+=============== ============= ===================
+
+7. Verify and Monitor
+---------------------
+
+Quick status
+~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl get scaledobject -n nextcloud-exapps && echo && \
+   kubectl get deploy nc-app-llm2 -n nextcloud-exapps && echo && \
+   kubectl get pods -n nextcloud-exapps -l app=nc-app-llm2 -o wide
+
+-  ``READY=True`` - KEDA can reach the metrics endpoint
+-  ``ACTIVE=False`` - no tasks queued
+-  ``AVAILABLE=1`` - one pod running (minReplicaCount)
+
+Watch scaling live
+~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   # Terminal 1: pods
+   kubectl get pods -n nextcloud-exapps -l app=nc-app-llm2 -w
+
+   # Terminal 2: deployment
+   kubectl get deploy nc-app-llm2 -n nextcloud-exapps -w
+
+   # Terminal 3: KEDA logs
+   kubectl logs -n keda -l app=keda-operator -f --tail=5
+
+Check HPA (KEDA creates this)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl get hpa -n nextcloud-exapps
+   kubectl describe hpa -n nextcloud-exapps
+
+Full dashboard
+~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   echo "=== ScaledObject ===" && \
+   kubectl get scaledobject -n nextcloud-exapps && echo && \
+   echo "=== HPA ===" && \
+   kubectl get hpa -n nextcloud-exapps && echo && \
+   echo "=== Deployment ===" && \
+   kubectl get deploy nc-app-llm2 -n nextcloud-exapps && echo && \
+   echo "=== Pods ===" && \
+   kubectl get pods -n nextcloud-exapps -l app=nc-app-llm2 -o wide && echo && \
+   echo "=== Queue ===" && \
+   curl -s -k -u "${NC_USER}:${NC_APP_PASSWORD}" \
+     "${NC_URL}/ocs/v2.php/taskprocessing/queue_stats?format=json"
+
+--------------
+
+Tuning Guide
+------------
+
+---------------------------+---------+---------+-------------------------------------+
+| Parameter                 | Example | Default | What it does                        |
+===========================+=========+=========+=====================================+
+| ``pollingInterval``       | 15      | 30      | Seconds between polls.              |
+|                           |         |         | Lower = faster reaction             |
+---------------------------+---------+---------+-------------------------------------+
+| ``cooldownPeriod``        | 120     | 300     | Seconds to wait before scaling down |
+---------------------------+---------+---------+-------------------------------------+
+| ``initialCooldownPeriod`` | 60      | 0       | Wait after new pod starts. Set to   |
+|                           |         |         | 60 for LLM model loading time       |
+---------------------------+---------+---------+-------------------------------------+
+| ``minReplicaCount``       | 1       | 0       | Min pods. Must be 1+ (AppAPI needs  |
+|                           |         |         | at least one pod for heartbeat)     |
+---------------------------+---------+---------+-------------------------------------+
+| ``maxReplicaCount``       | 4       | 100     | Max pods. Match your GPU count or   |
+|                           |         |         | time-slicing replicas               |
+---------------------------+---------+---------+-------------------------------------+
+| ``targetValue``           | 5       | \-      | Tasks per pod.                      |
+|                           |         |         | Lower = more pods sooner            |
+---------------------------+---------+---------+-------------------------------------+
+
+GPU time-slicing notes
+~~~~~~~~~~~~~~~~~~~~~~
+
+-  One physical GPU can be shared by multiple pods using NVIDIA
+   time-slicing
+-  Each llm2 pod uses about 8GB VRAM (model dependent)
+-  RTX 5090 (32GB): can run 3-4 pods with time-slicing replicas=4
+-  RTX 4090 (24GB): can run 2-3 pods with time-slicing replicas=3
+-  Set ``maxReplicaCount`` to match your time-slicing replicas
+-  CUDA gives each pod equal GPU time
+
+LLM notes
+~~~~~~~~~
+
+-  Model loading takes 30-60s. New pods are not ready right away
+-  Use ``initialCooldownPeriod`` to avoid over-scaling during warmup
+-  PVC access mode is ``ReadWriteOnce``. Works on single-node only
+-  Multi-node clusters are not supported yet
+
+--------------
+
+Cleanup
+-------
+
+.. code:: bash
+
+   # Remove KEDA ScaledObject
+   kubectl delete scaledobject llm2-scaler -n nextcloud-exapps
+
+   # Remove auth resources
+   kubectl delete triggerauthentication nextcloud-auth -n nextcloud-exapps
+   kubectl delete secret nextcloud-keda-auth -n nextcloud-exapps
--- a/developer_manual/exapp_development/scaling/KubernetesSetup.rst
+++ b/developer_manual/exapp_development/scaling/KubernetesSetup.rst
@@ -0,0 +1,411 @@
+.. _scaling-kubernetes-setup:
+
+Setting up Kubernetes
+=====================
+
+This guide will help you set up a local Kubernetes cluster
+(via `kind <https://kind.sigs.k8s.io/>`__)
+with HaRP and AppAPI for ExApp development.
+After completing these steps you will be able to register a k8s deploy daemon in Nextcloud and deploy a test app.
+
+Prerequisites
+-------------
+
+-  Docker must be installed and running
+-  A `nextcloud-docker-dev <https://github.com/juliusknorr/nextcloud-docker-dev>`__ environment running at ``https://nextcloud.local``
+
+   -  The Nextcloud container is on the ``master_default`` Docker
+      network
+
+-  ``kubectl`` installed (`install
+   guide <https://kubernetes.io/docs/tasks/tools/>`__)
+-  ``kind`` installed (`install
+   guide <https://kind.sigs.k8s.io/docs/user/quick-start/#installation>`__)
+-  HaRP repository cloned (e.g. ``~/nextcloud/HaRP``)
+
+Architecture overview
+---------------------
+
+.. mermaid::
+
+   graph TB
+      OCC[Browser / OCC / API calls] -->|"Nextcloud (PHP, in Docker container)"| nginx[nginx proxy]
+      nginx -->|/exapps/| HaRP["HaRP (host network, port 8780)"]
+      HaRP -->|"k8s API calls (Deployments, Services, PVCs)"| kind["kind cluster (nc-exapps)"]
+      kind --> ExApp["ExApp pod (e.g. test-deploy)"]
+
+-  **HaRP** runs on the host network (``--network=host``) and
+   communicates with:
+
+   -  The kind k8s API server (via ``https://127.0.0.1:<port>``)
+   -  ExApp pods via NodePort services (via the kind node IP)
+
+-  **Nextcloud** reaches HaRP via the Docker network gateway IP
+-  **nginx proxy** forwards ``/exapps/`` requests to HaRP
+
+--------------
+
+.. _scaling-kubernetes-setup-step-1:
+
+1. Create the kind Cluster
+--------------------------
+
+.. code:: bash
+
+   kind create cluster --name nc-exapps
+
+Verify:
+
+.. code:: bash
+
+   kubectl config use-context kind-nc-exapps
+   kubectl cluster-info
+   kubectl get nodes -o wide
+
+Note the **API server URL** (e.g. ``https://127.0.0.1:37151``) and the
+**node InternalIP** (e.g. ``172.18.0.2``):
+
+.. code:: bash
+
+   # API server
+   kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}'
+
+   # Node internal IP
+   kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'
+
+.. _scaling-kubernetes-setup-step-2:
+
+2. Create Namespace and RBAC
+----------------------------
+
+.. code:: bash
+
+   # Create the ExApps namespace
+   kubectl create namespace nextcloud-exapps
+
+   # Create a ServiceAccount for HaRP
+   kubectl -n nextcloud-exapps create serviceaccount harp-exapps
+
+   # Grant cluster-admin (for development; restrict in production)
+   kubectl create clusterrolebinding harp-exapps-admin \
+     --clusterrole=cluster-admin \
+     --serviceaccount=nextcloud-exapps:harp-exapps
+
+Generate a bearer token (valid for 1 year):
+
+.. code:: bash
+
+   kubectl -n nextcloud-exapps create token harp-exapps --duration=8760h
+
+.. note::
+
+   The ``redeploy_host_k8s.sh`` script generates this token
+   automatically, so you don’t need to copy it manually.
+
+.. _scaling-kubernetes-setup-step-3:
+
+3. Configure the nginx Proxy
+----------------------------
+
+The nextcloud-docker-dev nginx proxy must forward ``/exapps/`` to HaRP.
+
+Find the gateway IP of the ``master_default`` Docker network (this is
+how containers reach the host):
+
+.. code:: bash
+
+   docker network inspect master_default \
+     --format '{{range .IPAM.Config}}Gateway: {{.Gateway}}{{end}}'
+
+Typically this is your host IP like ``192.168.21.1`` (may vary on your
+machine).
+
+Edit the nginx vhost file:
+
+.. code:: bash
+
+   # Path relative to your nextcloud-docker-dev checkout:
+   # data/nginx/vhost.d/nextcloud.local_location
+
+Set the content to:
+
+.. code:: nginx
+
+   location /exapps/ {
+     set $harp_addr <GATEWAY_IP>:8780;
+     proxy_pass http://$harp_addr;
+
+     # Forward the true client identity
+     proxy_set_header Host              $host;
+     proxy_set_header X-Real-IP         $remote_addr;
+     proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
+     proxy_set_header X-Forwarded-Proto $scheme;
+   }
+
+Replace ``<GATEWAY_IP>`` with the gateway from above
+(e.g. ``192.168.21.1``).
+
+Reload nginx:
+
+.. code:: bash
+
+   docker exec master-proxy-1 nginx -s reload
+
+.. _scaling-kubernetes-setup-step-4:
+
+4. Build and Deploy HaRP
+------------------------
+
+From the HaRP repository root:
+
+.. code:: bash
+
+   cd ~/nextcloud/HaRP
+   bash development/redeploy_host_k8s.sh
+
+The script will:
+
+   1. Auto-detect the k8s API server URL
+   2. Generate a fresh bearer token
+   3. Build the HaRP Docker image
+   4. Start HaRP with k8s backend enabled on host network
+
+Wait for HaRP to become healthy:
+
+.. code:: bash
+
+   docker ps | grep harp
+   # Should show "(healthy)" after ~15 seconds
+
+Check logs if needed:
+
+.. code:: bash
+
+   docker logs appapi-harp --tail=20
+
+.. _scaling-kubernetes-setup-step-5:
+
+5. Register the k8s Deploy Daemon in Nextcloud
+----------------------------------------------
+
+Run this inside the Nextcloud container (replace ``<NC_CONTAINER>`` with
+your container ID or name, and ``<GATEWAY_IP>`` with the gateway from
+:ref:`Step 3 <scaling-kubernetes-setup-step-3>`):
+
+.. code:: bash
+
+   docker exec <NC_CONTAINER> sudo -E -u www-data php occ app_api:daemon:register \
+     k8s_local "Kubernetes Local" "kubernetes-install" \
+     "http" "<GATEWAY_IP>:8780" "http://nextcloud.local" \
+     --harp \
+     --harp_shared_key "some_very_secure_password" \
+     --harp_frp_address "<GATEWAY_IP>:8782" \
+     --k8s \
+     --k8s_expose_type=nodeport \
+     --set-default
+
+Verify:
+
+.. code:: bash
+
+   docker exec <NC_CONTAINER> sudo -E -u www-data php occ app_api:daemon:list
+
+.. _scaling-kubernetes-setup-step-6:
+
+6. Run Test Deploy
+------------------
+
+Via OCC
+~~~~~~~
+
+.. code:: bash
+
+   docker exec <NC_CONTAINER> sudo -E -u www-data php occ app_api:app:register test-deploy k8s_local \
+     --info-xml https://raw.githubusercontent.com/nextcloud/test-deploy/main/appinfo/info.xml \
+     --test-deploy-mode
+
+Expected output:
+
+::
+
+   ExApp test-deploy deployed successfully.
+   ExApp test-deploy successfully registered.
+
+Via API (same as what the Admin UI uses)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   # Start test deploy
+   curl -X POST -u admin:admin -H "OCS-APIREQUEST: true" -k \
+     "https://nextcloud.local/index.php/apps/app_api/daemons/k8s_local/test_deploy"
+
+   # Check status
+   curl -u admin:admin -H "OCS-APIREQUEST: true" -k \
+     "https://nextcloud.local/index.php/apps/app_api/daemons/k8s_local/test_deploy/status"
+
+   # Stop and clean up
+   curl -X DELETE -u admin:admin -H "OCS-APIREQUEST: true" -k \
+     "https://nextcloud.local/index.php/apps/app_api/daemons/k8s_local/test_deploy"
+
+Verify k8s Resources
+~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl get deploy,svc,pvc,pods -n nextcloud-exapps -o wide
+
+Unregister
+~~~~~~~~~~
+
+.. code:: bash
+
+   docker exec <NC_CONTAINER> sudo -E -u www-data php occ app_api:app:unregister test-deploy
+
+--------------
+
+Cluster Overview
+----------------
+
+==================== ===========================
+Component            Value
+==================== ===========================
+**Type**             kind (Kubernetes in Docker)
+**Cluster Name**     ``nc-exapps``
+**Node**             ``nc-exapps-control-plane``
+**ExApps Namespace** ``nextcloud-exapps``
+**ServiceAccount**   ``harp-exapps``
+==================== ===========================
+
+--------------
+
+Monitoring Commands
+-------------------
+
+Cluster Status
+~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl cluster-info
+   kubectl get nodes -o wide
+   kubectl get pods -n nextcloud-exapps
+   kubectl get pods -n nextcloud-exapps -w   # watch in real-time
+
+Pod Inspection
+~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl describe pod <pod-name> -n nextcloud-exapps
+   kubectl logs <pod-name> -n nextcloud-exapps
+   kubectl logs -f <pod-name> -n nextcloud-exapps      # follow logs
+   kubectl logs --previous <pod-name> -n nextcloud-exapps  # after restart
+
+Resources
+~~~~~~~~~
+
+.. code:: bash
+
+   kubectl get svc,deploy,pvc -n nextcloud-exapps
+   kubectl get all -n nextcloud-exapps
+
+HaRP Logs
+~~~~~~~~~
+
+.. code:: bash
+
+   docker logs appapi-harp --tail=50
+   docker logs -f appapi-harp   # follow
+
+--------------
+
+Troubleshooting
+---------------
+
+HaRP can’t reach k8s API
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   # Check if kind container is running
+   docker ps | grep kind
+
+   # Verify API server is reachable from host
+   curl -k https://127.0.0.1:37151/version
+
+Nextcloud can’t reach HaRP
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   # From inside the Nextcloud container, test connectivity to HaRP:
+   docker exec <NC_CONTAINER> curl -s http://<GATEWAY_IP>:8780/
+
+   # Should return "404 Not Found" (HaRP is responding)
+   # If connection refused: check HaRP is running and gateway IP is correct
+
+Heartbeat fails after successful deploy
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Check HaRP logs for routing errors:
+
+.. code:: bash
+
+   docker logs appapi-harp --tail=20
+
+HaRP lazily resolves the k8s Service upstream on first request after a
+restart, so restarting HaRP does **not** require re-deploying ExApps. If
+heartbeat still fails, verify the k8s Service exists and is reachable:
+
+.. code:: bash
+
+   kubectl get svc -n nextcloud-exapps
+
+Pods stuck in Pending
+~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl describe pod <pod-name> -n nextcloud-exapps
+   # Check Events section for scheduling or image pull issues
+
+Image pull errors
+~~~~~~~~~~~~~~~~~
+
+The kind cluster needs to be able to pull images. For public images
+(like ``ghcr.io/nextcloud/test-deploy:release``) this should work out of
+the box.
+
+Token expired
+~~~~~~~~~~~~~
+
+Regenerate by rerunning the redeploy script:
+
+.. code:: bash
+
+   cd ~/nextcloud/HaRP
+   bash development/redeploy_host_k8s.sh
+
+Clean up all ExApp resources
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   kubectl delete deploy,svc,pvc -n nextcloud-exapps --all
+
+Reset everything
+~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   # Remove daemon config
+   docker exec <NC_CONTAINER> sudo -E -u www-data php occ app_api:daemon:unregister k8s_local
+
+   # Delete kind cluster
+   kind delete cluster --name nc-exapps
+
+   # Remove HaRP container
+   docker rm -f appapi-harp
+
+Then start again from :ref:`Step 1<scaling-kubernetes-setup-step-1>`.
--- a/developer_manual/exapp_development/scaling/index.rst
+++ b/developer_manual/exapp_development/scaling/index.rst
@@ -0,0 +1,32 @@
+Scaling ExApps
+==============
+
+AppAPI delegates the scaling task to the ExApp itself.
+This means that the ExApp must be designed in a way so that it is possible to scale vertically.
+As for horizontal scaling, we recommend using Kubernetes for this.
+
+You could also implement, for example, a Server-Worker architecture for basic scaling.
+In this case, the Server is your ExApp and the Workers are the external machines that can work with the ExApp
+using Nextcloud user authentication.
+Additional clients (or workers) can be (optionally) added (or attached) to the ExApp
+to increase the capacity and performance.
+
+The rest of this section will explain how to setup and use Kubernetes for automated scaling.
+Additional instructions are also provided if you have a GPU device for GPU scaling.
+
+
+.. note::
+
+	Currently, if a Deploy daemon is configured with GPUs available,
+	AppAPI will by default attach all available GPU devices to each ExApp container on this Deploy daemon.
+	This means that these GPUs are shared between all ExApps on the same Deploy daemon.
+	Therefore, for the ExApps that require heavy use of GPUs,
+	it is recommended to have a separate Deploy daemon (host) for them.
+
+
+.. toctree::
+	:maxdepth: 2
+
+	KubernetesSetup
+	KEDASetup
+	AppAPIEmulation