Google Cloud GCS

This guide describes how to configure Alluxio with Google Cloud Storage (GCS) as the under storage system.

Google Cloud Storage (GCS) is a scalable and durable object storage service offered by Google Cloud Platform (GCP). It allows users to store and retrieve various types of data, including unstructured and structured data.

For more information about GCS, please read its documentation.

Prerequisites

Before you get started, please ensure you have the required information listed below:

<GCS_BUCKET>

<GCS_DIRECTORY>

The directory you want to use in the bucket, either by creating a new directory or using an existing one

The default GCS UFS module (GCS version 2) is implemented based on Google Cloud API which accepts Google application credentials. Fine grained permissions can be defined when creating the application credentials to limit access to specific buckets.

Basic Setup

Use the mount table operations to add a new mount point, specifying the Alluxio path to create the mount on and the GCS path as the UFS URI. Credentials and configuration options can also be specified as part of the mount operation as described byconfiguring mount points.

An example ufs.yaml to create a mount point with the operator:

apiVersion: k8s-operator.alluxio.com/v1
kind: UnderFileSystem
metadata:
  name: alluxio-gs
  namespace: alx-ns
spec:
  alluxioCluster: alluxio-cluster
  path: gs://<GS_BUCKET>/<PATH>
  mountPath: /gs
  mountOptions:
    fs.gcs.credential.path: /path/to/<google_application_credentials>.json

An example command to mount gs://<GCS_BUCKET>/<GCS_DIRECTORY> to /gs using GCS v2 if not using the operator:

bin/alluxio mount add --path /gs/ --ufs-uri gs://<GCS_BUCKET>/<GCS_DIRECTORY> \
  --option fs.gcs.credential.path=/path/to/<google_application_credentials>.json

This property key fs.gcs.credential.path provides the path to the Google application credentials json file. Note that the Google application credentials json file should be placed in all the Alluxio nodes in the same path. If the nodes running the Alluxio processes already contain the GCS credentials, this property may not be needed but it is always recommended to set this property explicitly.

In a Kubernetes environment, the credentials file should be provided as a secret. See how to add a secret as a file.

Advanced Setup

Customize the Directory Suffix

Directories are represented in GCS as zero-byte objects named with a specified suffix. The directory suffix can be updated with the configuration parameter alluxio.underfs.gcs.directory.suffix.

GCS Access Control

If Alluxio security is enabled, Alluxio enforces the access control inherited from underlying object storage.

The GCS credentials specified in Alluxio config represents a GCS user. GCS service backend checks the user permission to the bucket and the object for access control. If the given GCS user does not have access permissions to the specified bucket, a permission denied error will be thrown. When Alluxio security is enabled, Alluxio loads the bucket ACL to Alluxio when the metadata is first loaded to the Alluxio namespace.

Mapping from GCS ACL to Alluxio permission

Alluxio checks the GCS bucket READ/WRITE ACL to determine the owner's permission mode to a Alluxio file. For example, if the GCS user has read-only access to the underlying bucket, the mounted directory and files would have 0500 mode. If the GCS user has full access to the underlying bucket, the mounted directory and files would have 0700 mode.

Accessing GCS through Proxy

If the Alluxio cluster is behind a corporate proxy or a firewall, the Alluxio GCS integration may not be able to access the internet with the default settings.

Add the following java options to conf/alluxio-env.sh before starting the Alluxio coordinator and workers.

ALLUXIO_COORDINATOR_JAVA_OPTS+=" -Dhttps.proxyHost=<proxy_host> -Dhttps.proxyPort=<proxy_port> -Dhttp.proxyHost=<proxy_host> -Dhttp.proxyPort=<proxy_port> -Dhttp.nonProxyHosts=<non_proxy_host>"
ALLUXIO_WORKER_JAVA_OPTS+=" -Dhttps.proxyHost=<proxy_host> -Dhttps.proxyPort=<proxy_port> -Dhttp.proxyHost=<proxy_host> -Dhttp.proxyPort=<proxy_port> -Dhttp.nonProxyHosts=<non_proxy_host>"

An example value for http.nonProxyHosts is localhost|127.*|[::1]|192.168.0.0/16.

If username and password are required for the proxy, add the http.proxyUser, https.proxyUser, http.proxyPassword, and https.proxyPassword java options.

Last updated