Kerberos (Java)
This documentation describes how to set up an Alluxio cluster with Kerberos security, running on an AWS EC2 Linux machine locally as an example. To set up a cluster on multiple nodes, please replace the host field (localhost
) in Kerberos principals to <your.cluster.name>
. Or use the hostname-associated principal name and unset the alluxio.security.kerberos.unified.instance.name
.
Some frequently seen problems and questions are listed at the end of the document.
Setup Key Distribution Center (KDC)
Add Principals And Generate Keytab Files in KDC
Setup client-side Kerberos on Alluxio cluster
Please set up a standalone KDC before doing this. The KDC plays the role of a server, providing authentication service to clients. All the other nodes that contact the KDC for authentication are considered clients. In a Kerberized Alluxio cluster, all the Alluxio nodes need to contact the KDC as clients. Therefore, follow this guide to set up the Kerberos client-side packages and configurations in each node in the Alluxio cluster (not the KDC node). Kerberos clients also need a /etc/krb5.conf
to communicate with the KDC. The Kerberos client settings also work if you want to set up local Alluxio cluster on Max OS X.
Here is a sample /etc/krb5.conf
on an Alluxio node:
Setup Alluxio Cluster with Kerberos Security
Create user alluxio
, client
and foo
on the machines that you will install Alluxio on. The user alluxio
corresponds to the Kerberos principal alluxio/[email protected]
, client
corresponds to client/[email protected]
, and foo
corresponds to foo/[email protected]
. The user alluxio
will be the Alluxio service user that starts, manages and stops Alluxio servers. This user does not have be called alluxio
on your own deployment, and it can be an arbitrary string as long as it complies with the naming rules of the underlying operating system.
$ sudo adduser alluxio
$ sudo adduser client
$ sudo adduser foo
$ sudo passwd alluxio
$ sudo passwd client
$ sudo passwd foo
Alluxio server processes, e.g. masters, workers, etc. will be running under User alluxio
, so please add alluxio
to sudoers
so that the user will have permission to access ramdisks.
Add the following lines to the end of /etc/sudoers
(or use visudo
as root)
# User privilege specification
alluxio ALL=(ALL) NOPASSWD:ALL
Then, distribute the server and client keytab files from KDC to each node of the Alluxio cluster. Save them in some secure place and configure the user and group permission coordinately, the following snippets save the keytab files into /etc/alluxio/conf
, create the directory on each Alluxio node if it does not exist.
$ scp -i ~/your_aws_key_pair.pem <KDC_DNS_NAME>:alluxio.keytab /etc/alluxio/conf/
$ scp -i ~/your_aws_key_pair.pem <KDC_DNS_NAME>:client.keytab /etc/alluxio/conf/
$ scp -i ~/your_aws_key_pair.pem <KDC_DNS_NAME>:foo.keytab /etc/alluxio/conf/
$ sudo chown alluxio:alluxio /etc/alluxio/conf/alluxio.keytab
$ sudo chown client:alluxio /etc/alluxio/conf/client.keytab
$ sudo chown foo:alluxio /etc/alluxio/conf/foo.keytab
$ sudo chmod 0440 /etc/alluxio/conf/alluxio.keytab
$ sudo chmod 0440 /etc/alluxio/conf/client.keytab
$ sudo chmod 0440 /etc/alluxio/conf/foo.keytab
The owner of each keytab file should be the user who needs to access it.
To transfer files from Windows to Linux, you can use scp
through Cygwin, or use pscp.exe
in PuTTY.
Server Configuration
Login as alluxio
by executing the following:
$ su - alluxio
All the operations required for the rest of server configuration should be performed by user alluxio
.
When installing Alluxio, you can add the following configuration properties to alluxio-site.properties.
alluxio.security.authentication.type=KERBEROS
alluxio.security.authorization.permission.enabled=true
alluxio.security.kerberos.unified.instance.name=localhost
alluxio.security.kerberos.server.principal=alluxio/[email protected]
alluxio.security.kerberos.server.keytab.file=/etc/alluxio/conf/alluxio.keytab
In versions before 2.1.0, you also need to set alluxio.security.kerberos.service.name
and this is a required parameter.
alluxio.security.kerberos.service.name=alluxio
Note:
alluxio.security.kerberos.service.name
was a required parameter before Alluxio 2.1.0. In 2.1.0 this parameter is removed, because it can be extracted from the server principalalluxio.security.kerberos.server.principal
.alluxio.security.kerberos.server.principal
is a required parameter in JAAS environment. It should be in the format of<primary>/<instance>@REALM.COM
. The server principal must have the<instance>
name matching with the server hostname, i.e.alluxio.master.hostname
oralluxio.worker.hostname
. When Alluxio starts, the server principal is propagated to clients via If cluster defaults is disabled byalluxio.user.conf.cluster.default.enabled=false
, then the clients will need to be configured with the server principal properly.alluxio.security.kerberos.unified.instance.name
is optional when all the Alluxio servers share a single principal and a unified instance name. If this is not specified, thealluxio.security.kerberos.server.principal
must have the<instance>
name matching with the server hostname, i.e.alluxio.master.hostname
oralluxio.worker.hostname
.
Once the installation and configuration complete, start Alluxio service by executing the following:
$ ./bin/alluxio format
$ ./bin/alluxio-start.sh local SudoMount
Client Configuration
Client-side access to Alluxio cluster requires the following configurations: (Note: Server keytab file is not required for the client. The keytab files permission are configured in a way that client users would not be able to access server keytab file.)
alluxio.security.authentication.type=KERBEROS
alluxio.security.authorization.permission.enabled=true
alluxio.security.kerberos.unified.instance.name=localhost
alluxio.security.kerberos.client.principal=client/[email protected]
alluxio.security.kerberos.client.keytab.file=/etc/alluxio/conf/client.keytab
You can switch users by changing the client principal and keytab pair. An alternative client Kerberos login option is to invoke kinit
on client machines.
kinit -k -t /etc/alluxio/conf/client.keytab client/[email protected]
Invalid principal/keytab combinations and failure to find valid Kerberos credential in the ticket cache will result in the following error message. It indicates that the user cannot log in via Kerberos.
Failed to login: <detailed reason>
Please see the FAQ section for more details about login failures.
Run Sample Tests
After Alluxio is configured and installed, you can run a simple tests which will write several files to Alluxio and the configured UFS.
$ ./bin/alluxio runTests
Example
You can play with the following examples to verify that the Alluxio cluster you set up is indeed Kerberos-enabled.
First, act as super user alluxio
by setting the following configurations in conf/alluxio-site.properties
:
alluxio.security.kerberos.client.principal=alluxio/[email protected]
alluxio.security.kerberos.client.keytab.file=/etc/alluxio/conf/alluxio.keytab
Create some directories for different users via Alluxio filesystem shell:
$ ./bin/alluxio fs ls /
$ ./bin/alluxio fs mkdir /admin
$ ./bin/alluxio fs mkdir /client
$ ./bin/alluxio fs chown client /client
$ ./bin/alluxio fs chgrp client /client
$ ./bin/alluxio fs mkdir /foo
$ ./bin/alluxio fs chown foo /foo
$ ./bin/alluxio fs chgrp foo /foo
Now, you have /admin
owned by user alluxio
, /client
owned by user client
, and /foo
owned by user foo
.
If you change one or both of the above configurations to empty or a wrong value, then the Kerberos authentication should fail, so any command in ./bin/alluxio fs
should fail too.
Second, act as user client
by re-configuring conf/alluxio-site.properties
:
alluxio.security.kerberos.client.principal=client/[email protected]
alluxio.security.kerberos.client.keytab.file=/etc/alluxio/conf/client.keytab
Create some directories and put some files into Alluxio:
$ ./bin/alluxio fs ls -R /
$ ./bin/alluxio fs mkdir /client/dir
$ ./bin/alluxio fs copyFromLocal conf/alluxio-site.properties /client/file
$ ./bin/alluxio fs rm -R /client/dir
# This will fail
$ ./bin/alluxio fs mkdir /foo/bar
# This will fail
$ ./bin/alluxio fs rm -R /foo
The last two commands should fail since user client
has no write permission to /foo
which is owned by user foo
.
Similarly, switch to user foo
and try the filesystem shell:
alluxio.security.kerberos.client.principal=foo/[email protected]
alluxio.security.kerberos.client.keytab.file=/etc/alluxio/conf/foo.keytab
$ ./bin/alluxio fs ls -R /
$ ./bin/alluxio fs mkdir /foo/bar
$ ./bin/alluxio fs copyFromLocal conf/alluxio-site.properties /foo/bar/testfile
# This will fail
$ ./bin/alluxio fs copyFromLocal conf/alluxio-site.properties /client/foofile
The last command should fail because user foo
has no write permission to /client
which is owned by user client
.
Alternatively, if the Kerberos credential cache is of type DIR
or FILE
, the client can login through loading the credentials from the cache instead of the keytab file.
alluxio.security.kerberos.client.principal=client/[email protected]
alluxio.security.kerberos.client.keytab.file=
$ kinit -k -t /etc/alluxio/conf/client.keytab client/[email protected]
This would have the same effect as setting up the client keytab files. You can validate this by running similar examples as above:
$ ./bin/alluxio fs ls -R /
$ ./bin/alluxio fs mkdir /client/dir
$ ./bin/alluxio fs copyFromLocal conf/alluxio-site.properties /client/file
$ ./bin/alluxio fs rm -R /client/dir
$ ./bin/alluxio fs mkdir /foo/bar
$ ./bin/alluxio fs rm -R /foo
Using Delegation Token
When Kerberized Alluxio is used with a Kerberized Hadoop cluster, Alluxio can be configured to use delegation token instead of client principals and keytabs on compute nodes. Using delegation token reduces workload on KDC by greatly reducing the number of requests to KDC when a compute job is started. It also removes the requirement of having to deploy a client keytab to all compute node, thus makes it easier to deploy and maintain Alluxio clients. It is recommended to use delegation token whenever possible.
To enable delegation token on Alluxio, first configure the compute frameworks to obtain delegation tokens from Alluxio.
First, please add Alluxio client jar location to YARN resource manager class path:
export HADOOP_CLASSPATH=<PATH_TO_ALLUXIO_CLIENT_JAR>:${HADOOP_CLASSPATH}
Replace <PATH_TO_ALLUXIO_CLIENT_JAR>
with the actual Alluxio client jar location on the YARN resource manager node. After the change, please restart the resource manager.
For Spark, please add the following property to spark-defaults.conf and restart Spark and YARN:
spark.yarn.access.hadoopFileSystems=<ALLUXIO_ROOT_URL>
Replace <ALLUXIO_ROOT_URL>
with the actual Alluxio URL starting with alluxio://
. In single master mode, this URL can be alluxio://<HOSTNAME>:<PORT>/
. In HA mode, this URL should be alluxio://<ALLUXIO_SERVICE_ALIAS>/
.
For map reduce, please add the following property and restart YARN:
mapreduce.job.hdfs-servers=<ALLUXIO_ROOT_URL>
Replace <ALLUXIO_ROOT_URL>
with the actual Alluxio URL starting with alluxio://
.
In order to eliminate the requirement of client keytab on compute nodes, capability should also be enabled on Alluxio cluster. Please set the following property in alluxio-site.properties
on all Alluxio nodes:
alluxio.security.authorization.capability.enabled=true
Also make sure the client keytab and principal are not set in the client and server configuration:
alluxio.security.kerberos.client.principal=<CLIENT_PRINCIPAL>
alluxio.security.kerberos.client.keytab.file=<CLIENT_KEYTAB>
Please restart Alluxio and corresponding compute framework clients after the configuration change.
Kerberos-enabled Alluxio Integration with Secure-HDFS as UFS
If there is an existing Secure-HDFS with Kerberos enabled, here are the instructions to set up Alluxio to leverage the Secure-HDFS as the UFS.
In order to mount a secure HDFS to Alluxio, you will need a Kerberos principal and keytab file for an HDFS user. This HDFS user should be superuser for HDFS and be able to impersonate other HDFS users. If this HDFS user does not have impersonation access, property alluxio.underfs.hdfs.impersonation.enabled
must be turned off manually to disable impersonation.
In order for an HDFS user to be a superuser, the user must be in the OS group on the namenode, specified by the Hadoop configuration: dfs.permissions.superusergroup
.
In order to enable an HDFS user to impersonate other HDFS users, additional Hadoop configuration is required. To enable impersonation for an HDFS user named alluxiohdfs
, the following HDFS configuration parameters need to be set in core-site.xml
and HDFS must be restarted:
<property>
<name>hadoop.proxyuser.alluxiohdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.alluxiohdfs.groups</name>
<value>*</value>
</property>
Once HDFS is configured for the alluxiohdfs
user, and the Kerberos keytab is generated for the principal, the keytab must be distributed to all of the Alluxio servers (workers and masters). Now, Alluxio is ready to mount a secure HDFS. There are two ways to mount a secure HDFS to Alluxio: a root mount, or a nested mount.
Secure HDFS as a root mount
To configure Alluxio to root mount a secure HDFS, several configuration parameters are necessary in alluxio-site.properties
:
alluxio.underfs.address=hdfs://<ADDRESS>/<PATH>/
alluxio.master.mount.table.root.option.alluxio.underfs.hdfs.version=<HDFS_VERSION>
alluxio.master.mount.table.root.option.alluxio.underfs.hdfs.configuration=core-site.xml:hdfs-site.xml
alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.kerberos.client.principal=alluxiohdfs@ALLUXIO.COM
alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.kerberos.client.keytab.file=/alluxio/alluxiohdfs.keytab
alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.impersonation.enabled=true|false
alluxio.underfs.address
: this specifies the URI to the HDFS to mountalluxio.master.mount.table.root.option.alluxio.underfs.hdfs.configuration
: This points to a:
separated list of files that define the HDFS configuration. Typically, this should point to thecore-site.xml
file and thehdfs-site.xml
file. These configuration files must be available in the worker containers as well.alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.kerberos.client.principal
: Specfies the principal name to connect to this HDFS. In this example, it is[email protected]
.alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.kerberos.client.keytab.file
: Specifies the location of the keytab file for the principal. This location must be the same on all the masters and workers.alluxio.master.mount.table.root.option.alluxio.security.underfs.hdfs.impersonation.enabled
: If true, this means Alluxio should connect to the HDFS cluster using impersonation. If false, Alluxio will interact with the HDFS cluster directly with the previously specified principal.
Once these parameters are configured, Alluxio will have the secure HDFS cluster mounted at the root.
Secure HDFS as a nested mount
Alluxio can also mount an secure HDFS as a nested mount (not the root mount). To configure Alluxio in this scenario is very similar to the the root mount scenario, except the configuration is specified in the mount command, and not the configuration file. The following Alluxio CLI command will mount a secure HDFS as a nested mount:
$ ./bin/alluxio fs mount --option alluxio.underfs.hdfs.version=<HDFS_VERSION> \
--option alluxio.underfs.hdfs.configuration=core-site.xml:hdfs-site.xml \
--option alluxio.security.underfs.hdfs.kerberos.client.principal=alluxiohdfs@ALLUXIO.COM \
--option alluxio.security.underfs.hdfs.kerberos.client.keytab.file=/alluxio/alluxiohdfs.keytab \
--option alluxio.security.underfs.hdfs.impersonation.enabled=true|false \
/mnt/secure-hdfs/ hdfs://<ADDRESS>/<PATH>/
The descriptions of the parameters are described earlier.
Running Spark with Kerberos-enabled Alluxio and Secure-HDFS
Follow the Running-Spark-on-Alluxio guide to set up SPARK_CLASSPATH
. In addition, the following items should be added to make Spark aware of Kerberos configuration:
You can only use Spark on a Kerberos-enabled cluster in the YARN mode, not in the Standalone mode. Therefore, a secure YARN must be set up first.
Copy hadoop configurations (usually in
/etc/hadoop/conf/
)hdfs-site.xml
,core-site.xml
,yarn-site.xml
to{SPARK_HOME}/conf
.Copy Alluxio site configuration
{ALLUXIO_HOME}/conf/alluxio-site.properties
to{SPARK_HOME}/conf
for Spark to pick up Alluxio configurations such as Kerberos related flags.When launching Spark shell or jobs, please add
--principal
and--keytab
to specify Kerberos principal and keytab files for Spark.
./bin/spark-shell --principal=alluxio/[email protected] --keytab=/etc/alluxio/conf/alluxio.keytab
FAQ
Java Kerberos error messages can be hard to interpret. In general, it is helpful to enable Kerberos debug messages by adding the following to the JVM.
-Dsun.security.krb5.debug=true
Last updated