Google Cloud GCS
This guide describes how to configure Alluxio with Google Cloud Storage (GCS) as the under storage system.
Google Cloud Storage (GCS) is a scalable and durable object storage service offered by Google Cloud Platform (GCP). It allows users to store and retrieve various types of data, including unstructured and structured data.
For more information about GCS, please read its documentation.
Prerequisites
Before you get started, please ensure you have the required information listed below:
<GCS_BUCKET>
Create a new bucket in your Google Cloud account or use an existing bucket
<GCS_DIRECTORY>
The directory you want to use in the bucket, either by creating a new directory or using an existing one
The default GCS UFS module (GCS version 2) is implemented based on Google Cloud API which accepts Google application credentials. Fine grained permissions can be defined when creating the application credentials to limit access to specific buckets.
Basic Setup
Use the mount table operations to add a new mount point, specifying the Alluxio path to create the mount on and the GCS path as the UFS URI. Credentials and configuration options can also be specified as part of the mount operation as described byconfiguring mount points.
An example ufs.yaml
to create a mount point with the operator:
apiVersion: k8s-operator.alluxio.com/v1
kind: UnderFileSystem
metadata:
name: alluxio-gs
namespace: alx-ns
spec:
alluxioCluster: alluxio-cluster
path: gs://<GS_BUCKET>/<PATH>
mountPath: /gs
mountOptions:
fs.gcs.credential.path: /path/to/<google_application_credentials>.json
An example command to mount gs://<GCS_BUCKET>/<GCS_DIRECTORY>
to /gs
using GCS v2 if not using the operator:
bin/alluxio mount add --path /gs/ --ufs-uri gs://<GCS_BUCKET>/<GCS_DIRECTORY> \
--option fs.gcs.credential.path=/path/to/<google_application_credentials>.json
This property key fs.gcs.credential.path
provides the path to the Google application credentials json file.
Note that the Google application credentials json file should be placed in all the Alluxio nodes in the same path.
If the nodes running the Alluxio processes already contain the GCS credentials, this property may not be needed
but it is always recommended to set this property explicitly.
In a Kubernetes environment, the credentials file should be provided as a secret. See how to add a secret as a file.
Advanced Setup
Customize the Directory Suffix
Directories are represented in GCS as zero-byte objects named with a specified suffix. The directory suffix can be updated with the configuration parameter alluxio.underfs.gcs.directory.suffix
.
GCS Access Control
If Alluxio security is enabled, Alluxio enforces the access control inherited from underlying object storage.
The GCS credentials specified in Alluxio config represents a GCS user. GCS service backend checks the user permission to the bucket and the object for access control. If the given GCS user does not have access permissions to the specified bucket, a permission denied error will be thrown. When Alluxio security is enabled, Alluxio loads the bucket ACL to Alluxio when the metadata is first loaded to the Alluxio namespace.
Mapping from GCS ACL to Alluxio permission
Alluxio checks the GCS bucket READ/WRITE ACL to determine the owner's permission mode to a Alluxio file. For example, if the GCS user has read-only access to the underlying bucket, the mounted directory and files would have 0500
mode. If the GCS user has full access to the underlying bucket, the mounted directory and files would have 0700
mode.
Accessing GCS through Proxy
If the Alluxio cluster is behind a corporate proxy or a firewall, the Alluxio GCS integration may not be able to access the internet with the default settings.
Add the following java options to conf/alluxio-env.sh
before starting the Alluxio coordinator and workers.
ALLUXIO_COORDINATOR_JAVA_OPTS+=" -Dhttps.proxyHost=<proxy_host> -Dhttps.proxyPort=<proxy_port> -Dhttp.proxyHost=<proxy_host> -Dhttp.proxyPort=<proxy_port> -Dhttp.nonProxyHosts=<non_proxy_host>"
ALLUXIO_WORKER_JAVA_OPTS+=" -Dhttps.proxyHost=<proxy_host> -Dhttps.proxyPort=<proxy_port> -Dhttp.proxyHost=<proxy_host> -Dhttp.proxyPort=<proxy_port> -Dhttp.nonProxyHosts=<non_proxy_host>"
An example value for http.nonProxyHosts
is localhost|127.*|[::1]|192.168.0.0/16
.
If username and password are required for the proxy, add the http.proxyUser
, https.proxyUser
, http.proxyPassword
, and https.proxyPassword
java options.
Last updated