Deploying Hermes Agent on OpenShift

Hermes Agent on OpenShift: Private by default, cloud access when needed - AI generated

Introduction

In this post, I want to describe how to deploy the Hermes Agent on OpenShift and wire it to a self-hosted model endpoint running on the same cluster. This is a direct continuation of two earlier posts: Deploying OpenShift on AWS, which covers getting a cluster into place, and the post on running the Red Hat AI Inference Server on OpenShift, which covers the model serving layer that Hermes will talk to.

If you want background on what Hermes Agent is and why it is worth running, the companion post Hermes Agent: A Personal AI That Gets More Useful Over Time covers that in more detail. This post focuses on the mechanics of getting it running on OpenShift.

Architecture

The setup connects two namespaces on the same cluster. The Red Hat AI Inference Server (RHAIIS) runs in the rhaiis namespace and serves a model on port 8000. Hermes Agent runs in a separate hermes namespace and talks to the vLLM server over the internal cluster service network, using the DNS name rhaiis-vllm.rhaiis.svc.cluster.local:8000. No public route is involved in that hop.

OpenRouter is wired as an automatic fallback. If the vLLM server is unavailable or returns an error, Hermes falls back to a remote model through OpenRouter without requiring any manual intervention.

Externally, Hermes exposes an OpenAI-compatible API on port 8642, secured with a bearer token. An OpenShift Route with TLS termination handles the public endpoint.

Prerequisites

The RHAIIS deployment from the previous post must be running in the rhaiis namespace with a deployment named rhaiis-vllm.
An OpenRouter API key for the fallback model.

Deploying Hermes Agent

All deployment files are available in the smichard/agent_on_ocp GitHub repository. The steps below apply them in sequence.

Clone the repository:

git clone https://github.com/smichard/agent_on_ocp.git
cd hermes_on_ocp

Create the Namespace and ServiceAccount

oc new-project hermes

Hermes Agent runs as UID 10000. The default restricted SCC in OpenShift does not allow this, so the deployment needs a dedicated ServiceAccount with the anyuid SCC:

oc create serviceaccount hermes -n hermes

oc adm policy add-scc-to-user anyuid \
  -z hermes \
  -n hermes

Create Secrets

Three secrets are needed: one for the vLLM bearer token, one for the OpenRouter fallback key, and one for the Hermes API server key that clients must present.

vLLM bearer token:

Hermes reads the OPENAI_API_KEY environment variable for custom OpenAI-compatible endpoints. The vLLM API key from the rhaiis namespace is passed in under that name:

export RHAIIS_API_KEY=$(oc get secret vllm-api-key-secret -n rhaiis \
  -o jsonpath='{.data.VLLM_API_KEY}' | base64 -d)

oc create secret generic hermes-vllm-secret \
  --from-literal=OPENAI_API_KEY="${RHAIIS_API_KEY}" \
  -n hermes

OpenRouter fallback key:

oc create secret generic hermes-openrouter-secret \
  --from-literal=OPENROUTER_API_KEY=<your_openrouter_key> \
  -n hermes

Hermes API server key:

Clients calling the Hermes API must include this key as a bearer token. Generate a random value at creation time:

oc create secret generic hermes-api-secret \
  --from-literal=API_SERVER_KEY=$(openssl rand -hex 32) \
  -n hermes

Retrieve it later with:

oc get secret hermes-api-secret -n hermes \
  -o jsonpath='{.data.API_SERVER_KEY}' | base64 -d

Create the ConfigMap

The ConfigMap holds the Hermes Agent configuration file. It sets the primary model provider to the internal vLLM service and configures OpenRouter as the fallback:

apiVersion: v1
kind: ConfigMap
metadata:
  name: hermes-config
  namespace: hermes
  labels:
    app: hermes
data:
  config.yaml: |
    model:
      provider: "custom"
      base_url: "http://rhaiis-vllm.rhaiis.svc.cluster.local:8000/v1"
      default: "Qwen/Qwen3-Coder-30B-A3B-Instruct"

    fallback_model:
      provider: "openrouter"
      model: "anthropic/claude-sonnet-4-6"

    terminal:
      backend: "local"
      cwd: "/opt/data/workspace"
      timeout: 180
      lifetime_seconds: 300

    compression:
      enabled: true
      threshold: 0.50
      target_ratio: 0.20
      protect_last_n: 20

Adjust model.default to match the --served-model-name value used in the RHAIIS deployment. Adjust fallback_model.model to the OpenRouter model you want to use as a fallback.

oc apply -f configmap.yaml

Create a PersistentVolumeClaim

Hermes stores sessions, memories, and workspace data on a persistent volume:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: hermes-data
  namespace: hermes
  labels:
    app: hermes
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

Apply the file to create the PVC:

oc apply -f pvc.yaml

Deploy Hermes Agent

The Deployment mounts the ConfigMap and the PVC, injects the three secrets as environment variables, and runs the container as the hermes ServiceAccount. Check the Hermes Agent repository for the current container image reference before applying:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hermes
  namespace: hermes
  labels:
    app: hermes
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hermes
  template:
    metadata:
      labels:
        app: hermes
    spec:
      tolerations:
        - key: nvidia.com/gpu
          effect: NoSchedule
          operator: Exists
      serviceAccountName: hermes
      securityContext:
        runAsUser: 10000
        runAsNonRoot: true
        fsGroup: 10000
      volumes:
        - name: hermes-data
          persistentVolumeClaim:
            claimName: hermes-data
        - name: hermes-config
          configMap:
            name: hermes-config
      containers:
        - name: hermes
          image: nousresearch/hermes-agent:latest
          imagePullPolicy: Always
          args: ["gateway"]
          env:
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: hermes-vllm-secret
                  key: OPENAI_API_KEY
            - name: OPENROUTER_API_KEY
              valueFrom:
                secretKeyRef:
                  name: hermes-openrouter-secret
                  key: OPENROUTER_API_KEY
            - name: API_SERVER_KEY
              valueFrom:
                secretKeyRef:
                  name: hermes-api-secret
                  key: API_SERVER_KEY
            - name: API_SERVER_ENABLED
              value: "true"
            - name: API_SERVER_HOST
              value: "0.0.0.0"
            - name: API_SERVER_PORT
              value: "8642"
          ports:
            - name: api
              containerPort: 8642
              protocol: TCP
          volumeMounts:
            - name: hermes-data
              mountPath: /opt/data
            - name: hermes-config
              mountPath: /opt/data/config.yaml
              subPath: config.yaml
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "2"
              memory: "2Gi"
          livenessProbe:
            tcpSocket:
              port: 8642
            initialDelaySeconds: 30
            periodSeconds: 30
          readinessProbe:
            tcpSocket:
              port: 8642
            initialDelaySeconds: 15
            periodSeconds: 10
      restartPolicy: Always

Apply the file to create the deployment:

oc apply -f deployment.yaml

The API server is ready when the logs show:

[api_server] Listening on 0.0.0.0:8642

Hermes Gateway starting up, with 83 skills bundled and the messaging platform scheduler ready to accept requests

Create a Service and Route

Create a Service that maps port 8642 to port 8642 on the pod:

apiVersion: v1
kind: Service
metadata:
  name: hermes
  namespace: hermes
  labels:
    app: hermes
spec:
  selector:
    app: hermes
  ports:
    - name: api
      protocol: TCP
      port: 8642
      targetPort: 8642

Create a TLS-terminated Route to expose the endpoint outside the cluster (optional):

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: hermes
  namespace: hermes
  labels:
    app: hermes
spec:
  to:
    kind: Service
    name: hermes
  port:
    targetPort: api
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Redirect

Apply both and retrieve the assigned hostname:

oc apply -f service.yaml
oc apply -f route.yaml
oc get route hermes -n hermes -o jsonpath='{.spec.host}'

Testing the Endpoint

Store the hostname and API key in shell variables to keep the commands readable:

export HERMES_HOST=$(oc get route hermes -n hermes \
  -o jsonpath='{.spec.host}')
export HERMES_KEY=$(oc get secret hermes-api-secret -n hermes \
  -o jsonpath='{.data.API_SERVER_KEY}' | base64 -d)

Verify that the variables are populated before proceeding:

echo "HERMES_HOST : ${HERMES_HOST}"
echo "HERMES_API_KEY  : ${HERMES_KEY}"

List available models:

curl -sS \
  "https://${HERMES_HOST}/v1/models" \
  -H "Authorization: Bearer ${HERMES_KEY}" | jq -r '.data[].id'

Send a chat completion request:

curl -sS \
  "https://${HERMES_HOST}/v1/chat/completions" \
  -H "Authorization: Bearer ${HERMES_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen2.5-1.5B-Instruct",
    "messages": [{"role": "user", "content": "What is OpenShift?"}]
  }' | jq -r '.choices[0].message.content'

A successful response confirms that Hermes is running, the API key is working, and the request reached the vLLM server over the internal cluster network.

To verify the fallback path, scale down the RHAIIS deployment temporarily and send the same request. Hermes should return a response via OpenRouter instead:

oc scale deployment rhaiis-vllm -n rhaiis --replicas=0
# send a request, observe fallback in hermes logs
oc scale deployment rhaiis-vllm -n rhaiis --replicas=1

Changing the Model

Update configmap.yaml and set model.default to any model name served by the vLLM instance. The value must match the --served-model-name argument used in the RHAIIS deployment. Apply the updated ConfigMap and restart the Hermes deployment to pick up the change:

oc apply -f configmap.yaml
oc rollout restart deployment/hermes -n hermes

Connecting to Open WebUI

Hermes Agent exposes a standard OpenAI-compatible API, which means Open WebUI can connect to it directly as an external provider. As described in the prvious cases it is very easy to add the Hermes endpoint to the existing stack.

In Open WebUI, go to Settings > Connections and add a new external connection. Set the URL to the Hermes route hostname with the /v1 suffix, add the Hermes API server key created in step 3 as a bearer token, set the provider type to OpenAI, and the API type to Chat Completions. Leave the model ID field empty so Open WebUI queries the /v1/models endpoint and discovers available models automatically.

Open WebUI external connection configured against the Hermes Agent endpoint

Once saved, the model appears in the model selector alongside any other configured providers. Requests go from Open WebUI through Hermes to the vLLM server running on the same cluster.

Hermes agent is available in the Open WebUI interface alongside the model served by RHAIIS

Conclusion

This setup places Hermes Agent inside the same OpenShift cluster as the inference server and routes all model traffic over the internal service network. The public Hermes API endpoint is secured with a separate bearer token, so the vLLM key never leaves the cluster. OpenRouter handles the fallback case without any changes to the application code. The result is a self-hosted agent that uses a self-hosted model for most requests and degrades gracefully when the local server is unavailable.

References

GitHub repository with eployment files - link
Deploying OpenShift on AWS with Automated Cluster Provisioning - link
Running the Red Hat AI Inference Server on OpenShift - link
Hermes Agent: A Personal AI That Gets More Useful Over Time - link
OpenRouter - link
OpenShift CLI (oc) - link
smichard/agent_on_ocp - GitHub repository - link
Hermes Agent - GitHub repository - link
Hermes Agent - Documentation - link
Nous Research - link

Hermes Agent on OpenShift: Private by default, cloud access when needed - AI generated