Presto on Iceberg (Experimental)
Presto has introduced support for Iceberg tables in version 0.256.
This document describes how to use Presto to query Iceberg tables through Alluxio. This document is currently experimental, and the information provided here is subject to change.
In order to use Presto to query an Iceberg table, make sure you have a working setup of Presto, Hive Metastore and Alluxio, and Presto can access data through Alluxio's filesystem interface. If not, please refer to the guide on general Presto installation and configuration. Most of that guide apply for Iceberg workflows as well, and this document covers the specific instructions for working with Iceberg tables.
Prerequisites
All prerequisites from the general Presto setup;
Presto server, version 0.257 or later.
Basic Setup
Install Alluxio client jar to Presto Iceberg connector
Copy the Alluxio client jar located at {{site.ALLUXIO_CLIENT_JAR_PATH}}
into Presto Iceberg connector's directory located at ${PRESTO_HOME}/plugin/iceberg/
. Then restart the Presto server:
Also note that the same client jar file needs to be on Hive's classpath. If not, please refer to the section on setting up Hive to work with Alluxio.
Configure Presto to use the Iceberg connector
Presto reads and writes an Iceberg table using the Iceberg connector. To enable the Iceberg connector, create a catalog for Iceberg connector in Presto's installation directory as ${PRESTO_HOME}/etc/catalog/iceberg.properties
:
Change the Hive Metastore connection URI to match your setup.
Examples: Use Presto to Query Iceberg Tables on Alluxio
Create a schema and an Iceberg table
For demonstration purposes, we will create an example schema and an Iceberg table.
Launch the Presto CLI client with the following command:
For more information on the client, please refer to this section on querying tables using Presto. Note that the catalog is set to iceberg
since we will be dealing with Iceberg tables.
Run the following statements from the client:
Change the hostname and port in the Alluxio connection URI to match your setup.
These statements create a schema iceberg_test
and a table person
at the directory /person
in Alluxio filesystem, and with Parquet as the table's storage format.
Insert sample data into the table
Insert one row of sample data into the newly created table:
Note: there was a bug in the write path of Presto's Iceberg connector, so insertion may fail. This issue has been resolved in Presto version 0.257 by this PR.
Now you can verify things are working by reading back the data from the table:
As well as examine the files in Alluxio:
You can see the metadata and data files of the Iceberg table have been created.
Last updated