Impala
This guide describes how to setup Impala through Cloudera Manager to interact with Alluxio as its filesystem.
Prerequisites
You should already have Cloudera's Distribution installed. CDH 6 has been tested and the Cloudera Manager is used for the instructions in the rest of this document.
It is also assumed that Alluxio has been installed on the cluster.
Running CDH Impala
To run CDH Impala applications with Alluxio, some addition configuration is required. The following configurations assume that Alluxio is installed in /opt/alluxio.
Configuring core-site.xml files
Append the following sections to core-site.xml for the following sections:
- Under the - HDFScomponent, select- Configurationand search for- Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml- <property> <name>fs.alluxio.impl</name> <value>alluxio.hadoop.FileSystem</value> </property>
- Under the - Impalacomponent, select- Configurationand search for- Impala Catalog Server Advanced Configuration Snippet (Safety Valve) for core-site.xml- <property> <name>fs.alluxio.impl</name> <value>alluxio.hadoop.FileSystem</value> </property>
- Impala Daemon Advanced Configuration Snippet (Safety Valve) for core-site.xml- <property> <name>fs.alluxio.impl</name> <value>alluxio.hadoop.FileSystem</value> </property>
 
Configuring CLASSPATH
Edit the following sections to add the Alluxio client jar to the application classpath. In the following examples, it is assumed that the jar is located at /opt/alluxio/client/alluxio-enterprise-alluxio-2.8.0-5.4-client.jar; please double check that the value to set is a valid path to the Alluxio client jar.
- Under the - YARN (MR2 Included)component, select- Configurationand search for- Gateway Client Environment Advanced Configuration Snippet (Safety Valve) for hadoop-env.sh- HADOOP_CLASSPATH=/opt/alluxio/client/alluxio-enterprise-alluxio-2.8.0-5.4-client.jar:${HADOOP_CLASSPATH}
- YARN Application Classpath- /opt/alluxio/client/alluxio-enterprise-alluxio-2.8.0-5.4-client.jar
- MR Application Classpath- /opt/alluxio/client/alluxio-enterprise-alluxio-2.8.0-5.4-client.jar
 
- Under the - Impalacomponent, select- Configurationand search for- Impala Service Environment Advanced Configuration Snippet (Safety Valve)- CLASSPATH=/opt/alluxio/client/alluxio-enterprise-alluxio-2.8.0-5.4-client.jar:${CLASSPATH}
Configuring HIVE_AUX_JARS_PATH
Under the Hive component, select Configuration and search for Hive Auxiliary JARs Directory
/opt/alluxio/client/Example: Create an Impala table in Alluxio from HDFS
Here is an example to create an internal table in Impala backed by files in Alluxio. Download the MovieLens 100K dataset from http://grouplens.org/datasets/movielens/. Unzip this file and upload the downloaded data into /ml-100k/ in Alluxio:
$ ./bin/alluxio fs mkdir /ml-100k
$ ./bin/alluxio fs copyFromLocal /path/to/ml-100k alluxio:///ml-100kConnect to Impala using the impala-shell:
impala-shell -i myHostnamewhere myHostname is the name of the host to connect to
Create a new internal table pointing to Alluxio.
CREATE TABLE u_user (
  userid INT,
  age INT,
  gender CHAR(1),
  occupation STRING,
  zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'alluxio://master_hostname:port/ml-100k';An external table can be created by modifying the previous command from CREATE TABLE to CREATE EXTERNAL TABLE.
Last updated