Apache Flink
Last updated
Last updated
This guide describes how to get Alluxio running with , so that you can easily work with files stored in Alluxio.
Setup Java for Java 8 Update 161 or higher (8u161+), 64-bit.
Alluxio has been set up and is running.
Flink has been installed and set up.
Apache Flink allows to use Alluxio through a generic file system wrapper for the Hadoop file system. Therefore, the configuration of Alluxio is done mostly in Hadoop configuration files.
core-site.xml
If you have a Hadoop setup next to the Flink installation, add the following property to the core-site.xml
configuration file:
In case you don't have a Hadoop setup, you have to create a file called core-site.xml
with the following contents:
core-site.xml
in conf/flink-conf.yaml
Next, you have to specify the path to the Hadoop configuration in Flink. Open the conf/flink-conf.yaml
file in the Flink root directory and set the fs.hdfs.hadoopconf
configuration value to the directory containing the core-site.xml
. (For newer Hadoop versions, the directory usually ends with etc/hadoop
.)
We need to make the Alluxio jar
file available to Flink, because it contains the configured alluxio.hadoop.FileSystem
class.
There are different ways to achieve that:
Put the {{site.ALLUXIO_CLIENT_JAR_PATH}}
file into the lib
directory of Flink (for local and standalone cluster setups)
Put the {{site.ALLUXIO_CLIENT_JAR_PATH}}
file into the ship
directory for Flink on YARN.
Specify the location of the jar file in the HADOOP_CLASSPATH
environment variable (make sure its available on all cluster nodes as well). For example like this:
In addition, if there are any client-related properties specified in conf/alluxio-site.properties
, translate those to env.java.opts
in {FLINK_HOME}/conf/flink-conf.yaml
for Flink to pick up Alluxio configuration. For example, if you want to configure Alluxio client to use CACHE_THROUGH as the write type, you should add the following to {FLINK_HOME}/conf/flink-conf.yaml
.
Note: If there are running flink clusters, stop the flink clusters and restart them to apply the changes to the configuration.
To use Alluxio with Flink, just specify paths with the alluxio://
scheme.
If Alluxio is installed locally, a valid path would look like this alluxio://localhost:19998/user/hduser/gutenberg
.
This example assumes you have set up Alluxio and Flink as previously described.
Put the file LICENSE
into Alluxio, assuming you are in the top level Alluxio project directory:
Run the following command from the top level Flink project directory:
In order to communicate with Alluxio, we need to provide Flink programs with the Alluxio Core Client jar. We recommend you to download the tarball from Alluxio . Alternatively, advanced users can choose to compile this client jar from the source code by following the instructions . The Alluxio client jar can be found at {{site.ALLUXIO_CLIENT_JAR_PATH}}
.
Open your browser and check . There should be an output file output
which contains the word counts of the file LICENSE
.