Kerberos (Java)
This documentation describes how to set up an Alluxio cluster with Kerberos security, running on an AWS EC2 Linux machine locally as an example. To set up a cluster on multiple nodes, please replace the host field (localhost) in Kerberos principals to <your.cluster.name>. Or use the hostname-associated principal name and unset the alluxio.security.kerberos.unified.instance.name.
Some frequently seen problems and questions are listed at the end of the document.
Setup Key Distribution Center (KDC)
Add Principals And Generate Keytab Files in KDC
Setup client-side Kerberos on Alluxio cluster
Please set up a standalone KDC before doing this. The KDC plays the role of a server, providing authentication service to clients. All the other nodes that contact the KDC for authentication are considered clients. In a Kerberized Alluxio cluster, all the Alluxio nodes need to contact the KDC as clients. Therefore, follow this guide to set up the Kerberos client-side packages and configurations in each node in the Alluxio cluster (not the KDC node). Kerberos clients also need a /etc/krb5.conf to communicate with the KDC. The Kerberos client settings also work if you want to set up local Alluxio cluster on Max OS X.
Here is a sample /etc/krb5.conf on an Alluxio node:
Setup Alluxio Cluster with Kerberos Security
This section is about how to enable Kerberized secure communication between Alluxio components (i.e. clients, masters, and workers). This, however, does not cover the communication between Alluxio and any under storage. To enable Kerberos between Alluxio and a secure UFS, e.g. Kerberized HDFS, please refer to the section on integration with secure HDFS.
Create user alluxio, client and foo on the machines that you will install Alluxio on. The user alluxio corresponds to the Kerberos principal alluxio/[email protected], client corresponds to client/[email protected], and foo corresponds to foo/[email protected]. The user alluxio will be the Alluxio service user that starts, manages and stops Alluxio servers. This user does not have be called alluxio on your own deployment, and it can be an arbitrary string as long as it complies with the naming rules of the underlying operating system.
$ sudo adduser alluxio
$ sudo adduser client
$ sudo adduser foo
$ sudo passwd alluxio
$ sudo passwd client
$ sudo passwd fooAlluxio server processes, e.g. masters, workers, etc. will be running under User alluxio, so please add alluxio to sudoers so that the user will have permission to access ramdisks.
Add the following lines to the end of /etc/sudoers (or use visudo as root)
# User privilege specification
alluxio ALL=(ALL) NOPASSWD:ALLThen, distribute the server and client keytab files from KDC to each node of the Alluxio cluster. Save them in some secure place and configure the user and group permission coordinately, the following snippets save the keytab files into /etc/alluxio/conf, create the directory on each Alluxio node if it does not exist.
$ scp -i ~/your_aws_key_pair.pem <KDC_DNS_NAME>:alluxio.keytab /etc/alluxio/conf/
$ scp -i ~/your_aws_key_pair.pem <KDC_DNS_NAME>:client.keytab /etc/alluxio/conf/
$ scp -i ~/your_aws_key_pair.pem <KDC_DNS_NAME>:foo.keytab /etc/alluxio/conf/$ sudo chown alluxio:alluxio /etc/alluxio/conf/alluxio.keytab
$ sudo chown client:alluxio /etc/alluxio/conf/client.keytab
$ sudo chown foo:alluxio /etc/alluxio/conf/foo.keytab
$ sudo chmod 0440 /etc/alluxio/conf/alluxio.keytab
$ sudo chmod 0440 /etc/alluxio/conf/client.keytab
$ sudo chmod 0440 /etc/alluxio/conf/foo.keytabThe owner of each keytab file should be the user who needs to access it.
To transfer files from Windows to Linux, you can use scp through Cygwin, or use pscp.exe in PuTTY.
Server Configuration
Login as alluxio by executing the following:
$ su - alluxioAll the operations required for the rest of server configuration should be performed by user alluxio.
When installing Alluxio, you can add the following configuration properties to alluxio-site.properties.
alluxio.security.authentication.type=KERBEROS
alluxio.security.authorization.permission.enabled=true
alluxio.security.kerberos.unified.instance.name=localhost
alluxio.security.kerberos.server.principal=alluxio/[email protected]
alluxio.security.kerberos.server.keytab.file=/etc/alluxio/conf/alluxio.keytabIn versions before 2.1.0, you also need to set alluxio.security.kerberos.service.name and this is a required parameter.
alluxio.security.kerberos.service.name=alluxioNote:
- alluxio.security.kerberos.service.namewas a required parameter before Alluxio 2.1.0. In 2.1.0 this parameter is removed, because it can be extracted from the server principal- alluxio.security.kerberos.server.principal.
- alluxio.security.kerberos.server.principalis a required parameter in JAAS environment. It should be in the format of- <primary>/<instance>@REALM.COM. The server principal must have the- <instance>name matching with the server hostname, i.e.- alluxio.master.hostnameor- alluxio.worker.hostname. When Alluxio starts, the server principal is propagated to clients via If cluster defaults is disabled by- alluxio.user.conf.cluster.default.enabled=false, then the clients will need to be configured with the server principal properly.
- alluxio.security.kerberos.unified.instance.nameis optional when all the Alluxio servers share a single principal and a unified instance name. If this is not specified, the- alluxio.security.kerberos.server.principalmust have the- <instance>name matching with the server hostname, i.e.- alluxio.master.hostnameor- alluxio.worker.hostname.
- Instance names must be lowercase. If the hostnames contain uppercase letters, make sure to convert to lowercase when creating the principals and specifying them in Alluxio configuration. 
Once the installation and configuration complete, start Alluxio service by executing the following:
$ ./bin/alluxio format
$ ./bin/alluxio-start.sh local SudoMountClient Configuration
Client-side access to Alluxio cluster requires the following configurations: (Note: Server keytab file is not required for the client. The keytab files permission are configured in a way that client users would not be able to access server keytab file.)
alluxio.security.authentication.type=KERBEROS
alluxio.security.authorization.permission.enabled=true
alluxio.security.kerberos.unified.instance.name=localhost
alluxio.security.kerberos.client.principal=client/[email protected]
alluxio.security.kerberos.client.keytab.file=/etc/alluxio/conf/client.keytabYou can switch users by changing the client principal and keytab pair.
An alternative client Kerberos login option is to leave out the client principal and keytab in the above configuration file, and instead manually log in via invoking kinit on client machines.
kinit -k -t /etc/alluxio/conf/client.keytab client/[email protected]This has the advantage of being more convenient to switch between different principals, but also requires manual renewal when the kerberos ticket expires. For Alluxio clients used in long-running services, you should specify client principal and keytab in the configuration file so that Alluxio handles ticket renewal automatically.
Invalid principal/keytab combinations and failure to find valid Kerberos credential in the ticket cache will result in the following error message. It indicates that the user cannot log in via Kerberos.
Failed to login: <detailed reason>Please see the FAQ section for more details about login failures.
Run Sample Tests
After Alluxio is configured and installed, you can run a simple tests which will write several files to Alluxio and the configured UFS.
$ ./bin/alluxio runTestsExample
You can play with the following examples to verify that the Alluxio cluster you set up is indeed Kerberos-enabled.
First, act as superuser alluxio by setting the following configurations in conf/alluxio-site.properties:
alluxio.security.kerberos.client.principal=alluxio/[email protected]
alluxio.security.kerberos.client.keytab.file=/etc/alluxio/conf/alluxio.keytabCreate some directories for different users via Alluxio filesystem shell:
$ ./bin/alluxio fs ls /
$ ./bin/alluxio fs mkdir /admin
$ ./bin/alluxio fs mkdir /client
$ ./bin/alluxio fs chown client /client
$ ./bin/alluxio fs chgrp client /client
$ ./bin/alluxio fs mkdir /foo
$ ./bin/alluxio fs chown foo /foo
$ ./bin/alluxio fs chgrp foo /fooNow, you have /admin owned by user alluxio, /client owned by user client, and /foo owned by user foo.
If you change one or both of the above configurations to empty or a wrong value, then the Kerberos authentication should fail, so any command in ./bin/alluxio fs should fail too.
Second, act as user client by re-configuring conf/alluxio-site.properties:
alluxio.security.kerberos.client.principal=client/[email protected]
alluxio.security.kerberos.client.keytab.file=/etc/alluxio/conf/client.keytabCreate some directories and put some files into Alluxio:
$ ./bin/alluxio fs ls -R /
$ ./bin/alluxio fs mkdir /client/dir
$ ./bin/alluxio fs copyFromLocal conf/alluxio-site.properties /client/file
$ ./bin/alluxio fs rm -R /client/dir
# This will fail
$ ./bin/alluxio fs mkdir /foo/bar
# This will fail
$ ./bin/alluxio fs rm -R /fooThe last two commands should fail since user client has no write permission to /foo which is owned by user foo.
Similarly, switch to user foo and try the filesystem shell:
alluxio.security.kerberos.client.principal=foo/[email protected]
alluxio.security.kerberos.client.keytab.file=/etc/alluxio/conf/foo.keytab$ ./bin/alluxio fs ls -R /
$ ./bin/alluxio fs mkdir /foo/bar
$ ./bin/alluxio fs copyFromLocal conf/alluxio-site.properties /foo/bar/testfile
# This will fail
$ ./bin/alluxio fs copyFromLocal conf/alluxio-site.properties /client/foofileThe last command should fail because user foo has no write permission to /client which is owned by user client.
Alternatively, if the Kerberos credential cache is of type DIR or FILE, the client can login through loading the credentials from the cache instead of the keytab file.
alluxio.security.kerberos.client.principal=client/[email protected]
alluxio.security.kerberos.client.keytab.file=$ kinit -k -t /etc/alluxio/conf/client.keytab client/[email protected]This would have the same effect as setting up the client keytab files. You can validate this by running similar examples as above:
$ ./bin/alluxio fs ls -R /
$ ./bin/alluxio fs mkdir /client/dir
$ ./bin/alluxio fs copyFromLocal conf/alluxio-site.properties /client/file
$ ./bin/alluxio fs rm -R /client/dir
$ ./bin/alluxio fs mkdir /foo/bar
$ ./bin/alluxio fs rm -R /fooUsing Delegation Token
When Kerberized Alluxio is used with a Kerberized Hadoop cluster, Alluxio can be configured to use delegation token instead of client principals and keytabs on compute nodes. Using delegation token reduces workload on KDC by greatly reducing the number of requests to KDC when a compute job is started. It also removes the requirement of having to deploy a client keytab to all compute node, thus makes it easier to deploy and maintain Alluxio clients. It is recommended to use delegation token whenever possible.
To enable delegation token on Alluxio, first configure the compute frameworks to obtain delegation tokens from Alluxio.
First, please add Alluxio client jar location to YARN resource manager class path:
export HADOOP_CLASSPATH=<PATH_TO_ALLUXIO_CLIENT_JAR>:${HADOOP_CLASSPATH}Replace <PATH_TO_ALLUXIO_CLIENT_JAR> with the actual Alluxio client jar location on the YARN resource manager node. After the change, please restart the resource manager.
For Spark, please add the following property to spark-defaults.conf and restart Spark and YARN:
spark.yarn.access.hadoopFileSystems=<ALLUXIO_ROOT_URL>Replace <ALLUXIO_ROOT_URL> with the actual Alluxio URL starting with alluxio://. In single master mode, this URL can be alluxio://<HOSTNAME>:<PORT>/. In HA mode, this URL should be alluxio://<ALLUXIO_SERVICE_ALIAS>/.
For map reduce, please add the following property and restart YARN:
mapreduce.job.hdfs-servers=<ALLUXIO_ROOT_URL>Replace <ALLUXIO_ROOT_URL> with the actual Alluxio URL starting with alluxio://.
In order to eliminate the requirement of client keytab on compute nodes, capability should also be enabled on Alluxio cluster. Please set the following property in alluxio-site.properties on all Alluxio nodes:
alluxio.security.authorization.capability.enabled=trueAlso make sure the client keytab and principal are not set in the client and server configuration:
alluxio.security.kerberos.client.principal=<CLIENT_PRINCIPAL>
alluxio.security.kerberos.client.keytab.file=<CLIENT_KEYTAB>Please restart Alluxio and corresponding compute framework clients after the configuration change.
Kerberos-enabled Alluxio Integration with Secure-HDFS as UFS
If there is an existing Secure-HDFS with Kerberos enabled, here are the instructions to set up Alluxio to leverage the Secure-HDFS as the UFS.
In order to mount a secure HDFS to Alluxio, you will need a Kerberos principal and keytab file for an HDFS user. This HDFS user should be superuser for HDFS and be able to impersonate other HDFS users. If this HDFS user does not have impersonation access, property alluxio.security.underfs.hdfs.impersonation.enabled must be turned off manually to disable impersonation.
In order for an HDFS user to be a superuser, the user must be in the OS group on the namenode, specified by the Hadoop configuration: dfs.permissions.superusergroup.
In order to enable an HDFS user to impersonate other HDFS users, additional Hadoop configuration is required. To enable impersonation for an HDFS user named alluxiohdfs, the following HDFS configuration parameters need to be set in core-site.xml and HDFS must be restarted:
<property>
  <name>hadoop.proxyuser.alluxiohdfs.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.alluxiohdfs.groups</name>
  <value>*</value>
</property>For more details on HDFS impersonation, please refer to the dedicated section in security documentation.
Once HDFS is configured for the alluxiohdfs user, and the Kerberos keytab is generated for the principal, the keytab must be distributed to all of the Alluxio servers (workers and masters). Now, Alluxio is ready to mount a secure HDFS. There are two ways to mount a secure HDFS to Alluxio: a root mount, or a nested mount.
Secure HDFS as a root mount
To configure Alluxio to root mount a secure HDFS, several configuration parameters are necessary in alluxio-site.properties:
alluxio.underfs.address=hdfs://<ADDRESS>/<PATH>/
alluxio.master.mount.table.root.option.alluxio.underfs.hdfs.version=<HDFS_VERSION>
alluxio.master.mount.table.root.option.alluxio.underfs.hdfs.configuration=core-site.xml:hdfs-site.xml
alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.kerberos.client.principal=alluxiohdfs@ALLUXIO.COM
alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.kerberos.client.keytab.file=/alluxio/alluxiohdfs.keytab
alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.impersonation.enabled=true|false- alluxio.underfs.address: this specifies the URI to the HDFS to mount
- alluxio.master.mount.table.root.option.alluxio.underfs.hdfs.configuration: This points to a- :separated list of files that define the HDFS configuration. Typically, this should point to the- core-site.xmlfile and the- hdfs-site.xmlfile. These configuration files must be available in the worker containers as well.
- alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.kerberos.client.principal: Specifies the principal name to connect to this HDFS. In this example, it is- [email protected].
- alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.kerberos.client.keytab.file: Specifies the location of the keytab file for the principal. This location must be the same on all the masters and workers.
- alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.impersonation.enabled: If true, this means Alluxio should connect to the HDFS cluster using impersonation. If false, Alluxio will interact with the HDFS cluster directly with the previously specified principal.
Once these parameters are configured, Alluxio will have the secure HDFS cluster mounted at the root.
Secure HDFS as a nested mount
Alluxio can also mount a secure HDFS as a nested mount (not the root mount). To configure Alluxio in this scenario is very similar to the root mount scenario, except the configuration is specified in the mount command, and not the configuration file. The following Alluxio CLI command will mount a secure HDFS as a nested mount:
$ ./bin/alluxio fs mount --option alluxio.underfs.hdfs.version=<HDFS_VERSION> \
  --option alluxio.underfs.hdfs.configuration=core-site.xml:hdfs-site.xml \
  --option alluxio.security.underfs.hdfs.kerberos.client.principal=alluxiohdfs@ALLUXIO.COM \
  --option alluxio.security.underfs.hdfs.kerberos.client.keytab.file=/alluxio/alluxiohdfs.keytab \
  --option alluxio.security.underfs.hdfs.impersonation.enabled=true|false \
  /mnt/secure-hdfs/ hdfs://<ADDRESS>/<PATH>/The descriptions of the parameters are described earlier.
Running Spark with Kerberos-enabled Alluxio and Secure-HDFS
Follow the Running-Spark-on-Alluxio guide to set up SPARK_CLASSPATH. In addition, the following items should be added to make Spark aware of Kerberos configuration:
- You can only use Spark on a Kerberos-enabled cluster in the YARN mode, not in the Standalone mode. Therefore, a secure YARN must be set up first. 
- Copy hadoop configurations (usually in - /etc/hadoop/conf/)- hdfs-site.xml,- core-site.xml,- yarn-site.xmlto- {SPARK_HOME}/conf.
- Copy Alluxio site configuration - {ALLUXIO_HOME}/conf/alluxio-site.propertiesto- {SPARK_HOME}/conffor Spark to pick up Alluxio configurations such as Kerberos related flags.
- When launching Spark shell or jobs, please add - --principaland- --keytabto specify Kerberos principal and keytab files for Spark.
./bin/spark-shell --principal=alluxio/[email protected] --keytab=/etc/alluxio/conf/alluxio.keytabFAQ
Java Kerberos error messages can be hard to interpret. In general, it is helpful to enable Kerberos debug messages by adding the following to the JVM.
-Dsun.security.krb5.debug=trueLast updated