# Cluster Management

This guide covers cluster-wide administration: hardening a basic install for production, ongoing lifecycle operations (scaling, upgrades, dynamic config), and multi-tenancy. For Coordinator architecture and HA, see [Coordinator Management](https://documentation.alluxio.io/ee-ai-en/administration/managing-coordinators). For hash-ring-related operations (worker lifecycle, identity persistence, ring bloat), see [Hash Ring and Worker Lifecycle](https://documentation.alluxio.io/ee-ai-en/administration/managing-ring). For per-worker configuration (storage, resources, JVM, network), see [Worker Configuration](https://documentation.alluxio.io/ee-ai-en/administration/managing-worker).

## 1. Production Setup

The basic configuration shown in the [Kubernetes Installation](https://documentation.alluxio.io/ee-ai-en/start/installing-on-kubernetes) guide is suitable for evaluation. For production deployments, apply the additional settings below for HA, resource tuning, persistent metadata, and worker identity.

### Label Nodes

A common practice is to assign dedicated nodes to each Alluxio component. This prevents resource contention between components (for example, etcd I/O interfering with worker cache I/O) and gives you predictable placement for capacity planning.

```shell
kubectl label nodes <coordinator-node> alluxio-role=coordinator
kubectl label nodes <worker-node-1> alluxio-role=worker
kubectl label nodes <worker-node-2> alluxio-role=worker
kubectl label nodes <worker-node-3> alluxio-role=worker
kubectl label nodes <etcd-node-1> alluxio-role=etcd
kubectl label nodes <etcd-node-2> alluxio-role=etcd
kubectl label nodes <etcd-node-3> alluxio-role=etcd
```

Worker pods have an anti-affinity rule by default — multiple worker pods will not be scheduled on the same node.

### Production `alluxio-cluster.yaml`

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
metadata:
  name: alluxio-cluster
  namespace: alx-ns
spec:
  image: <PRIVATE_REGISTRY>/alluxio-enterprise
  imageTag: AI-3.8-15.1.0
  properties:
    alluxio.license: <YOUR_CLUSTER_LICENSE>
  coordinator:
    nodeSelector:
      alluxio-role: coordinator
    metastore:
      type: persistentVolumeClaim
      storageClass: "gp2"
      size: 4Gi
    resources:
      # Set requests equal to limits for Guaranteed QoS, same as the worker.
      limits:
        cpu: "8"
        memory: "16Gi"
      requests:
        cpu: "8"
        memory: "16Gi"
    jvmOptions:
      - "-Xmx8g"
      - "-Xms8g"
  worker:
    nodeSelector:
      alluxio-role: worker
    count: 3
    pagestore:
      size: 1000Gi
      reservedSize: 100Gi
    resources:
      # Set requests equal to limits so the worker runs in the Guaranteed
      # QoS class and is the last to be evicted under node pressure.
      limits:
        cpu: "8"
        memory: "24Gi"
      requests:
        cpu: "8"
        memory: "24Gi"
    jvmOptions:
      - "-Xmx12g"
      - "-Xms12g"
      - "-XX:MaxDirectMemorySize=12g"
  etcd:
    replicaCount: 3
    nodeSelector:
      alluxio-role: etcd
```

Key differences from the basic configuration:

* **Node selectors**: Pin each component to dedicated nodes to prevent resource contention and ensure predictable placement. See the label commands above.
* **Worker count**: number of workers depending on the target cache volume and target throughput. For post-deployment scaling, see [Scaling the Cluster](#scaling-the-cluster).
* **ETCD replicas**: 3 for quorum-based HA. Deploy on dedicated, stable nodes.
* **Resource limits and JVM options**: Explicitly set to prevent OOM. The container memory limit must exceed the sum of `-Xmx` and `-XX:MaxDirectMemorySize`. For both workers and the coordinator, set `requests` equal to `limits` — this places the pods in the [Guaranteed QoS class](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/), so they are the last to be evicted when the node is under memory or CPU pressure. A Burstable pod (requests < limits) can be evicted while still inside its limit if another pod exceeds its request, causing abrupt cache loss (worker) or scheduler disruption (coordinator).
* **Persistent metastore**: Coordinator metadata survives pod restarts.

Other important settings for production deployment:

* **License Management**: A cluster license is the simplest way to get started. For production environments, a **deployment license** is recommended. See [Appendix C: License Management](https://documentation.alluxio.io/ee-ai-en/start/installing-on-kubernetes#c.-license-management) for details on both options.
* **Hash Ring Configuration**: It is critical to configure the hash ring **before** deployment, as changes can be destructive. For detailed guidance, see [Hash Ring Pre-Deployment Configuration](https://documentation.alluxio.io/ee-ai-en/managing-ring#id-1.-pre-deployment-configuration).
* **Worker Identity Persistence**: Configure `worker.systemInfo.hostPath` so workers rejoin with the same UUID after restarts. Without this, each restart adds stale `OFFLINE` entries to the hash ring, degrading cache hit rates. See [Restarting a Worker](https://documentation.alluxio.io/ee-ai-en/managing-ring#restarting-a-worker).
* **Heterogeneous Clusters**: If your cluster includes workers with different capacities, you must define a specific data distribution strategy. See [Heterogeneous Workers](https://documentation.alluxio.io/ee-ai-en/managing-worker#heterogeneous-workers) for configuration steps.
* **Worker Page Store**: The [page store](https://documentation.alluxio.io/ee-ai-en/how-alluxio-works#id-5.-worker-storage-the-page-store) is where each worker caches data. Key defaults and options:
  * **Default:** `type: hostPath`, `hostPath: /mnt/alluxio/pagestore`. The worker writes cache to the node's filesystem at that path. On multi-disk nodes, verify this lands on a data disk, not the system disk.
  * **Multi-disk nodes:** Set `pagestore.hostPath` explicitly to a data disk (e.g. `/mnt/data1/alluxio/pagestore`). See [Multi-Disk Configuration](https://documentation.alluxio.io/ee-ai-en/managing-worker#multi-disk-configuration).
  * **Persistent cache across pod restarts:** Use a PVC instead of hostPath. See [Configuring Page Store Location](https://documentation.alluxio.io/ee-ai-en/managing-worker#configuring-page-store-location).
  * **Sizing:** The `size` parameter sets the cache capacity; `reservedSize` allocates space for internal operations (temporary page writes, file metadata caching). Set `reservedSize` to \~10% of `size` (10–100 GiB) and ensure the total (size + reservedSize) fits within the worker's storage.
* **Advanced Configuration**: For resource and JVM tuning, see [Worker Configuration — Resource and JVM Tuning](https://documentation.alluxio.io/ee-ai-en/managing-worker#id-2.-resource-and-jvm-tuning). For other settings like external etcd, refer to [Appendix B: Advanced Configuration](https://documentation.alluxio.io/ee-ai-en/start/installing-on-kubernetes#b.-advanced-configuration).

### Running Multiple Clusters on Shared Nodes

If multiple Alluxio clusters are deployed across different namespaces on the same Kubernetes cluster, services from different clusters may be scheduled onto the same node, causing deployment failures. Label nodes to indicate which cluster they belong to:

```shell
kubectl label nodes <node-name> cluster=alluxio-a
kubectl label nodes <node-name> cluster=alluxio-b
```

Then specify the `nodeSelector` at the cluster level in each `alluxio-cluster.yaml`:

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
  image: <PRIVATE_REGISTRY>/alluxio-enterprise
  imageTag: AI-3.8-15.1.0
  nodeSelector:
    cluster: alluxio-a
```

## 2. Cluster Lifecycle and Configuration

This section covers fundamental operations related to the cluster's lifecycle, such as scaling, upgrades, and dynamic configuration updates.

### Scaling the Cluster

You can dynamically scale the number of Alluxio workers up or down to adjust to workload changes.

**To Scale Up Workers:**

1. Modify your `alluxio-cluster.yaml` file and increase the `count` under the `worker` section. The example below scales from 2 to 3 workers.
2. Apply the change to your cluster.

```shell
# Apply the changes to Kubernetes
$ kubectl apply -f alluxio-cluster.yaml
alluxiocluster.k8s-operator.alluxio.com/alluxio-cluster configured

# Verify the new worker pods are being created
$ kubectl -n alx-ns get pod
NAME                                          READY   STATUS            RESTARTS   AGE
...
alluxio-cluster-worker-58999f8ddd-p6n59       0/1     PodInitializing   0          4s

# Wait for all workers to become ready
$ kubectl -n alx-ns get pod -l app.kubernetes.io/component=worker
NAME                                          READY   STATUS    RESTARTS   AGE
alluxio-cluster-worker-58999f8ddd-cd6r2       1/1     Running   0          5m21s
alluxio-cluster-worker-58999f8ddd-rtftk       1/1     Running   0          4m21s
alluxio-cluster-worker-58999f8ddd-p6n59       1/1     Running   0          34s
```

### Upgrading Alluxio

The upgrade process involves two main steps: upgrading the Alluxio Operator and then upgrading the Alluxio cluster itself.

#### Step 1: Upgrade the Operator

The operator is stateless and can be safely re-installed without affecting the running Alluxio cluster.

1. Obtain the new Docker images for the operator and the new Helm chart.
2. Uninstall the old operator and install the new one.

```shell
# Uninstall the current operator
$ helm -n alluxio-operator uninstall operator --wait
release "operator" uninstalled

# Replace the new CRDs from the new Helm chart directory and create the new ones
$ kubectl create -f alluxio-operator/crds
$ kubectl replace -f alluxio-operator/crds
customresourcedefinition.apiextensions.k8s.io/alluxioclusters.k8s-operator.alluxio.com replaced
customresourcedefinition.apiextensions.k8s.io/clustergroups.k8s-operator.alluxio.com replaced
customresourcedefinition.apiextensions.k8s.io/collectinfoes.k8s-operator.alluxio.com replaced
customresourcedefinition.apiextensions.k8s.io/licenses.k8s-operator.alluxio.com replaced
customresourcedefinition.apiextensions.k8s.io/underfilesystems.k8s-operator.alluxio.com replaced

# Install the new operator using your configuration file (update the image tag)
$ helm -n alluxio-operator install operator -f alluxio-operator.yaml --create-namespace .
```

#### Step 2: Upgrade the Alluxio Cluster

The operator will perform a rolling upgrade of the Alluxio components.

1. Upload the new Alluxio Docker images to your registry.
2. Update the `imageTag` in your `alluxio-cluster.yaml` to the new version.
3. Apply the configuration change.

```shell
# Apply the updated cluster definition
$ kubectl apply -f alluxio-cluster.yaml
alluxiocluster.k8s-operator.alluxio.com/alluxio-cluster configured

# Monitor the rolling upgrade process
$ kubectl -n alx-ns get pod
NAME                                          READY   STATUS     RESTARTS   AGE
alluxio-cluster-coordinator-0                 0/1     Init:0/2   0          7s
...
alluxio-cluster-worker-58999f8ddd-cd6r2       0/1     Init:0/2   0          7s
alluxio-cluster-worker-5d6786f5bf-cxv5j       1/1     Running    0          10m

# Check the cluster status until it returns to 'Ready'
$ kubectl -n alx-ns get alluxiocluster
NAME              CLUSTERPHASE   AGE
alluxio-cluster   Updating       10m
...
NAME              CLUSTERPHASE   AGE
alluxio-cluster   Ready          12m

# Verify the new version is running
$ kubectl -n alx-ns exec -it alluxio-cluster-coordinator-0 -- alluxio info version 2>/dev/null
AI-3.8-15.1.0
```

During the rolling upgrade, workers are restarted in batches, such that the workers in the current batch must be fully ready before the next batch starts. The default batch size is 10% of the workers.

The number of workers in a batch can be reduced to minimize the interruption to running workloads during the period, at the cost of extending the period. To control the proportion or set the exact number of workers to restart, set the following in `alluxio-cluster.yaml`:

```yaml
apiVersion: k8s-operator.alluxio.com/v1
kind: AlluxioCluster
spec:
  worker:
    rollingUpdate:
      maxUnavailable: 1  # by default this is 10%, but in addition to setting a percentage value, it can also be set to an exact number
```

### Dynamically Updating Configuration

You can change Alluxio properties in a running cluster by editing its ConfigMap.

1. **Find the ConfigMap** for your cluster.

   ```console
   $ kubectl -n alx-ns get configmap
   NAME                              DATA   AGE
   alluxio-cluster-alluxio-conf      4      7m48s
   ...
   ```
2. **Edit the ConfigMap** to modify `alluxio-site.properties`, `alluxio-env.sh`, etc.

   ```console
   $ kubectl -n alx-ns edit configmap alluxio-cluster-alluxio-conf
   ```
3. **Restart components** to apply the changes.
   * **Coordinator:** `kubectl -n alx-ns rollout restart statefulset alluxio-cluster-coordinator`
   * **Workers:** `kubectl -n alx-ns rollout restart deployment alluxio-cluster-worker`
   * **DaemonSet FUSE:** `kubectl -n alx-ns rollout restart daemonset alluxio-fuse`
   * **CSI FUSE:** These pods must be restarted by exiting the application pod or by manually deleting the FUSE pod (`kubectl -n alx-ns delete pod <fuse-pod-name>`).

## 3. Multi-Tenancy and Federation

For large-scale enterprise deployments, Alluxio provides advanced features for multi-tenancy and cluster federation. This allows multiple teams and business units to share data infrastructure securely and efficiently while simplifying administrative overhead.

The reference architecture below features an API Gateway that centrally handles authentication and authorization across multiple Alluxio clusters.

### Core Concepts

#### Authentication

Alluxio integrates with external enterprise identity providers like **Okta**. When a user logs in, the provider authenticates them and generates a [**JSON Web Token (JWT)**](https://auth0.com/docs/secure/tokens/access-tokens/access-token-profiles). This JWT is then sent with every subsequent request to the Alluxio API Gateway to verify the user's identity.

#### Authorization

Once a user is authenticated, Alluxio uses an external policy engine, [**Open Policy Agent (OPA)**](https://www.openpolicyagent.org/docs/latest/#running-opa), to determine what actions the user is authorized to perform. Administrators can write fine-grained access control policies in OPA's declarative language, **Rego**, to control which users can access which resources. The API Gateway queries OPA for every request to ensure it is authorized.

#### Multi-Tenancy and Isolation

Alluxio enforces isolation between tenants to ensure security and prevent interference. This is achieved through:

* **User Roles:** Defining different roles with specific access levels and permissions.
* **Cache Isolation:** Assigning tenant-specific cache configurations, including quotas, TTLs, and eviction policies, ensuring one tenant's workload does not negatively impact another's.

### Cluster Federation

For organizations with multiple Alluxio clusters (e.g., across different regions or for different business units), federation simplifies management. A central **Management Console** provides a single pane of glass for:

* Cross-cluster monitoring and metrics.
* Executing operations across multiple clusters simultaneously.
* Centralized license management for all clusters.

### Example Workflow: Updating a Cache Policy

This workflow demonstrates how the components work together:

1. **Authentication:** A user logs into the **Management Console**, which redirects them to **Okta** for authentication. Upon success, Okta issues a JWT.
2. **Request Submission:** The user uses the console to submit a request to change a cache TTL. The request, containing the JWT, is sent to the **API Gateway**.
3. **Authorization:** The API Gateway validates the JWT and queries the **OPA Policy Engine** to check if the user has permission to modify cache settings for the target tenant.
4. **Execution:** If the request is authorized, the API Gateway forwards the command to the coordinator of the relevant Alluxio cluster, which then applies the new TTL policy.
