Overview

The service provides an easy to use unified management view for configuration and monitoring, and wizard based curation of deployment workflows.

  • Connect Your Data Sources: Connect Alluxio to data storage and catalogs across multiple clouds, single cloud or on-premises using guided wizards.

  • Monitor Your Alluxio Cluster: Monitor your Alluxio cluster.

  • Manage Configuration: Set and distribute configuration for a cluster.

When to Use

Data Orchestration Hub agents are co-deployed on the cluster running Alluxio. The Hub can connect to multiple Alluxio clusters across environments. Further instructions can be found in the deployment section below.

Once connected to an Alluxio cluster, the Hub can be used to modify the state of the Alluxio cluster, such as updating configuration and restarting processes. The following scenarios illustrate usage of the Hub web interface.

Scenario A: Managing an Alluxio cluster

The Hub can be used to view a dashboard to monitor the state of processes on the cluster, as well as update configuration and restart processes.

Monitor the status of an Alluxio cluster anywhere. You can start or stop cluster components from an intuitive UI.

Scenario B: Connecting to data sources across regions

Alluxio is used to connect a compute cluster with data sources across private data-centers and public clouds potentially over a wide area network. The Hub uses a self-guided wizard based approach to allow users to connect to data sources and catalogs in the same or remote data centers. A user is guided through the required configuration steps along with validation of the connection.

These wizards are applicable for multiple scenarios including: hybrid cloud, cross-data center, single cloud or private data center deployments.

Connect Alluxio to all your data sources across multiple clouds, single cloud or on-premises using self-guided wizards.

Further usage scenarios and descriptions for the available toolset can be found by following this section below.

Deployment

The Hub consists of the following components deployed on your Alluxio cluster.

  • Hub Manager: The Hub Manager serves requests to Alluxio processes via Hub Agents. The Hub Manager is used to register and communicate with the hosted Hub UI. To gain access to the Hub UI, please contact [email protected]. This is a process that runs on the same node as a Alluxio Master by default, and provides the REST endpoints to serve UI requests. When using multiple Alluxio masters, any node can be chosen to deploy the Hub Manager.

  • Hub Agent: The Hub agents are deployed on both Alluxio Masters and Alluxio Workers. These agent processes serve requests from the Hub Manager to make changes to the cluster without SSH access.

The following diagram illustrates the Hub architecture:

Hub Agents must be present on all managed nodes whereas the Hub Manager is a single instance.

Choose your compute environment to see how to deploy Data Orchestration Hub.

AWS EMR

No manual steps are required. When Alluxio is deployed using the Amazon EMR [bootstrap action](../cloud/AWS-EMR.md), the Hub comes pre-deployed.

Google Dataproc

No manual steps are required. When Alluxio is deployed using the Google Dataproc [initialization action](../cloud/Google-Dataproc.md), the Hub comes pre-deployed.

Kubernetes

When Alluxio is deployed using helm, set the property `hub.enabled` to `true` to deploy the Hub on the Kubernetes cluster.

Architecture

  • Hub agent containers are deployed alongside the master and worker pods, as sidecar containers. Hub manager is a separate pod associated with a Kubernetes Deployment and Service.

  • Restarting a process through the UI does not restart the pod but only restarts the container. For example: if you restart the Job Worker process, only one container in the worker pod is restarted.

  • A configMap is used to persist configuration for all processes across pod restarts. This includes alluxio-env.sh, alluxio-site.properties, and log4j.properties. The Hub manager uses a Kubernetes Service Account to access the API Server. If using helm, specify a service account name using the property hub.serviceAccount.

To view the Hub manager UI, you may run the following command:

$ kubectl port-forward $(kubectl get pods | grep alluxio-hub | cut -d ' ' -f1) 30077:30077

RBAC

In Kubernetes clusters with RBAC enabled, please configure a service account and RBAC role with sufficient permissions to access the Kubernetes API server. Either a Role or ClusterRole can be created, though the following instructions guide you through creating a Role.

A. Create a service account for the Hub manager deployment:

$ kubectl create serviceaccount alluxio

B. Create a role with permissions to update Alluxio's configMap:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: alluxio-role
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["alluxio-config"]
  verbs: ["get", "update"]

C. Bind the role created to the service account:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: alluxio-role-binding
  namespace: default
subjects:
- kind: ServiceAccount
  name: alluxio
  apiGroup: ""
roleRef:
  kind: Role
  name: alluxio-role
  apiGroup: rbac.authorization.k8s.io

D. Check permissions: Verify that the service account has permissions to update the configMap.

$ kubectl auth can-i update configmap/alluxio-config --as=system:serviceaccount:default:alluxio

To view a complete list of permissions, execute the following.

$ kubectl auth can-i --list --as=system:serviceaccount:default:alluxio

Limitations

  • A single configMap is shared across process types. Any configuration specific to a process type, other than the Master, may be lost on pod restart. To avoid potential loss, we recommend symmetric configuration definitions across processes.

Standalone

For all other deployment environments, the Hub processes must be started on the cluster nodes. Note that the binaries and scripts to launch the Hub are bundled with the Alluxio Enterprise tarball.

After extracting the tarball, run the following scripts from the Alluxio installation directory.

The recommended OS user for the Hub Manager is alluxio (the same user running the Alluxio processes itself), and for the Hub Agents is root. Hub agents require root privileges to modify configuration and restart compute engines like Presto. If this functionality is not required, Hub agents can still be started with the OS user alluxio.

To get more information and get access to the Hub, please contact [[email protected]](mailto:[email protected]).

Getting Started

The Hub web interface is a hosted service that gives users a single access point to connect and interact with their Alluxio clusters via Hub Manager/Agents.

Generating an API key

An API/secret key pair is required to authenticate the Hub Manager with the Hosted Hub. Before you start the Hub Manager and Hub Agents, you will need to access the Hub UI and generate an API/secret key pair. Once generated, you must set alluxio.hub.authentication.apiKey amd alluxio.hub.authentication.secretKey in the Hub Manager's alluxio-site.properties.

Click on the "New API Key" button and follow the prompts to generate an API and secret keypair.

Starting the Hub

To start the Hub Manager on the primary master node only:

$ ${ALLUXIO_HOME}/bin/alluxio-start.sh -a hub_manager

To start the Hub Agents on all nodes, execute the following on each node:

$ ${ALLUXIO_HOME}/bin/alluxio-start.sh -a hub_agent

Stopping the Hub

In order to stop the Hub, execute the following on the node the Hub manager was started:

$ ${ALLUXIO_HOME}/bin/alluxio-stop.sh -a hub_manager
$ ${ALLUXIO_HOME}/bin/alluxio-stop.sh -a hub_agent

Execute the following on all nodes:

$ ${ALLUXIO_HOME}/bin/alluxio-stop.sh -a hub_agent

Configuration

For a complete list of properties applicable to the Hub, please search for properties prefixed with alluxio.hub on this page.

Note:

  • alluxio.hub.hosted.rpc.hostname (required) specifies the address of the Hosted Hub for the Hub manager to register with.

  • alluxio.hub.authentication.apiKey (required) is a required API key that is used (with the secret key) to authenticate the Hub manager.

  • alluxio.hub.authentication.secretKey (required) is a required secret key that is used (with the api key) to authenticate the Hub manager.

  • alluxio.hub.cluster.label (optional) can be set to label the cluster to help identify it when managing multiple clusters.

All other properties are optional. These properties should be set in before starting the Hub processes. The mechanism varies depending on the compute environment selected as in the deployment section above.

What next

Once deployed, you can visit the Hub at url provided by Alluxio (same as alluxio.hub.hosted.rpc.hostname). Sign in using the configured username and password.

Sign in using the admin credentials. Default: Username = 'alluxio', Password = 'alluxio'.

In the console you have access to the following:

If you have multiple Alluxio clusters, you can connect all of them to the Hub and have access to the features listed above for each cluster.

Click on a cluster to access the selected cluster's processes dashboard, configuration wizard, and much more.

Last updated