Alluxio, formerly Tachyon, is an open source, memory speed, virtual distributed storage. It enables any application to interact with any data from any storage system at memory speed. Read more about Alluxio here.
What platforms and Java versions can Alluxio run on?
Alluxio requires JDK 1.8 or JDK 11 to run on various distributions of Linux / MacOS.
What license is Alluxio under?
Alluxio is open sourced under the Apache 2.0 license.
Why is my analytics job not running faster after deploying Alluxio?
Some possible reasons to consider:
The job is computation bound and does not spend significant time reading or writing data. Because the bottleneck is not in I/O performance, the benefit from faster Alluxio I/O is small.
The persistent storage is co-located with compute (e.g. Alluxio is connected to a local HDFS) and the input data of the job is in the OS buffer cache.
Due to misconfiguration, clients are not able to identify their corresponding local Alluxio worker. This results in reading from remote Alluxio workers through the network, resulting in low data-locality.
Input data is not loaded into Alluxio yet or already evicted, causing the job to read from the under storage instead of the Alluxio cache.
Should I deploy Alluxio as a stand-alone system or through an orchestration framework?
It is recommended to deploy Alluxio as a stand-alone system. Orchestration frameworks supported include:
Alluxio is primarily developed in Java and exposes Java-like File APIs for other applications to interact with. Alluxio supports other language bindings (experimental currently) including Python and Golang.
Alluxio can be run as a FUSE mount exposing a POSIX API. This enables any program which normally accesses a local file system to access data from Alluxio without modification. This is a common way for applications written in non-Java languages or non-Hadoop APIs to access Alluxio data without needing to rewrite the application.
What happens if my data set does not fit in memory?
It is not required for the input data set to fit in Alluxio storage space in order for applications to work. Alluxio will transparently load data on demand from the under storage. To help fit more data in Alluxio's storage space, configure Alluxio to leverage other storage resources such as SSD and HDD in addition to memory to extend Alluxio storage capacity. Read more about Alluxio storage setup here.