Google Cloud Storage
Last updated
Last updated
This guide describes how to configure Alluxio with as the under storage system.
The Alluxio binaries must be on your machine. You can either , or .
In preparation for using GCS with Alluxio, create a bucket (or use an existing bucket). You should also note the directory you want to use in that bucket, either by creating a new directory in the bucket, or using an existing one. For the purposes of this guide, the GCS bucket name is called GCS_BUCKET
, and the directory in that bucket is called GCS_DIRECTORY
.
For more information on GCS, please read its .
Alluxio provides two ways to access GCS. GCS version 1 is implemented based on library which is design for AWS S3. Thus, it only accepts Google cloud storage interoperability access/secret keypair which allows full access to all Google cloud storages inside a Google cloud project. No permission or access control can be placed on the interoperability keys. The conjunction of Google interoperability API and jets3t library also impact the performance of the default GCS UFS module.
The default GCS UFS module (GCS version 2) is implemented based on Google Cloud API which accepts . Based on the application credentials, Google cloud can determine what permissions an authenticated client has for its target Google cloud storage bucket. Besides, GCS with Google cloud API has much better performance than the default one in metadata and read/write operations.
A GCS bucket can be mounted to the Alluxio either at the root of the namespace, or at a nested directory.
Configure Alluxio to use under storage systems by modifying conf/alluxio-site.properties
. If it does not exist, create the configuration file from the template.
Configure Alluxio to use GCS as its root under storage system. The first modification is to specify an existing GCS bucket and directory as the under storage system by modifying conf/alluxio-site.properties
to include:
Choose your preferred GCS UFS version and provide the corresponding Google credentials.
First, within conf/alluxio-site.properties
, specify the master host:
Then, mount GCS:
Start up Alluxio locally to see that everything works.
Run a simple example program:
Visit your GCS directory GCS_BUCKET/GCS_DIRECTORY
to verify the files and directories created by Alluxio exist. For this test, you should see files named like:
To stop Alluxio, you can run:
If Alluxio security is enabled, Alluxio enforces the access control inherited from underlying object storage.
The GCS credentials specified in Alluxio config represents a GCS user. GCS service backend checks the user permission to the bucket and the object for access control. If the given GCS user does not have the right access permission to the specified bucket, a permission denied error will be thrown. When Alluxio security is enabled, Alluxio loads the bucket ACL to Alluxio permission on the first time when the metadata is loaded to Alluxio namespace.
Alluxio checks the GCS bucket READ/WRITE ACL to determine the owner's permission mode to a Alluxio file. For example, if the GCS user has read-only access to the underlying bucket, the mounted directory and files would have 0500
mode. If the GCS user has full access to the underlying bucket, the mounted directory and files would have 0700
mode.
If you want to share the GCS mount point with other users in Alluxio namespace, you can enable alluxio.underfs.object.store.mount.shared.publicly
.
Command such as chown
, chgrp
, and chmod
to Alluxio directories and files do NOT propagate to the underlying GCS buckets nor objects.
If the Alluxio cluster is behind a corporate proxy or a firewall, the Alluxio GCS integration may not be able to access the internet with the default settings.
Add the following java options to conf/alluxio-env.sh
before starting the Alluxio Masters and Workers.
An example value for http.nonProxyHosts
is localhost|127.*|[::1]|192.168.0.0/16
.
If username and password are required for the proxy, add the http.proxyUser
, https.proxyUser
, http.proxyPassword
, and https.proxyPassword
java options.
The first property key tells Alluxio to load the Version 1 GCS UFS module which uses the library.
Replace <GCS_ACCESS_KEY_ID>
and <GCS_SECRET_ACCESS_KEY>
with actual , or other environment variables that contain your credentials. Note: GCS interoperability is disabled by default. Please click on the Interoperability tab in and enable this feature. Click on Create a new key
to get the Access Key and Secret pair.
After these changes, Alluxio should be configured to work with GCS as its under storage system, and you can .
An GCS location can be mounted at a nested directory in the Alluxio namespace to have unified access to multiple under storage systems. Alluxio's can be used for this purpose.
This should start an Alluxio master and an Alluxio worker. You can see the master UI at .
Directories are represented in GCS as zero-byte objects named with a specified suffix. The directory suffix can be updated with the configuration parameter .
By default, Alluxio tries to extract the GCS user id from the credentials. Optionally, alluxio.underfs.gcs.owner.id.to.username.mapping
can be used to specify a preset gcs owner id to Alluxio username static mapping in the format id1=user1;id2=user2
. The Google Cloud Storage IDs can be found at the console . Please use the "Owners" one.