You should already have Cloudera's Distribution installed. CDH 6 has been tested and the Cloudera Manager is used for the instructions in the rest of this document.
It is also assumed that Alluxio has been installed on the cluster.
Running CDH Impala
To run CDH Impala applications with Alluxio, some addition configuration is required. The following configurations assume that Alluxio is installed in /opt/alluxio.
Configuring core-site.xml files
Append the following sections to core-site.xml for the following sections:
Under the HDFS component, select Configuration and search for Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml
Edit the following sections to add the Alluxio client jar to the application classpath. In the following examples, it is assumed that the jar is located at /opt/alluxio/client/alluxio-enterprise-alluxio-2.8.0-5.4-client.jar; please double check that the value to set is a valid path to the Alluxio client jar.
Under the YARN (MR2 Included) component, select Configuration and search for
Gateway Client Environment Advanced Configuration Snippet (Safety Valve) for hadoop-env.sh
YARN Application Classpath
MR Application Classpath
Under the Impala component, select Configuration and search for Impala Service Environment Advanced Configuration Snippet (Safety Valve)
Configuring HIVE_AUX_JARS_PATH
Under the Hive component, select Configuration and search for Hive Auxiliary JARs Directory
Example: Create an Impala table in Alluxio from HDFS
Here is an example to create an internal table in Impala backed by files in Alluxio. Download the MovieLens 100K dataset from http://grouplens.org/datasets/movielens/. Unzip this file and upload the downloaded data into /ml-100k/ in Alluxio:
Connect to Impala using the impala-shell:
where myHostname is the name of the host to connect to
Create a new internal table pointing to Alluxio.
An external table can be created by modifying the previous command from CREATE TABLE to CREATE EXTERNAL TABLE.