Alluxio
ProductsLanguageHome
  • Introduction
  • Overview
    • Architecture
    • Job Service
    • Quick Start Guide
    • FAQ
    • Use Cases
  • Core Services
    • Caching
    • Unified Namespace
  • Install Alluxio
    • Local Machine
    • Cluster
    • Cluster with HA
    • Docker
    • Software Requirements
  • Kubernetes
    • Deploy
    • Spark on Kubernetes
    • Metrics
  • Cloud Native
    • Alibaba Cloud ACK
    • AWS EMR
    • Tencent EMR
    • Google Dataproc
  • Compute Integration
    • Apache Spark
    • Apache Hadoop MapReduce
    • Apache Flink
    • Apache Hive
    • Presto on Iceberg (Experimental)
    • Presto
    • Trino
    • Tensorflow
  • Storage Integrations
    • Amazon AWS S3
    • HDFS
    • Azure Blob Store
    • Azure Data Lake Storage Gen2
    • Azure Data Lake Storage
    • Google Cloud Storage
    • Qiniu Kodo
    • COSN
    • CephObjectStorage
    • MinIO
    • NFS
    • Aliyun Object Storage Service
    • Ozone
    • Swift
    • WEB
    • CephFS
  • Security
  • Operations
    • Configuration Settings
    • User CLI
    • Admin CLI
    • Web UI
    • Journal Management
    • Metastore Management
    • Metrics
  • Administration
    • Troubleshooting
    • Basic Logging
    • Remote Logging
    • Performance Tuning
    • Scalability Tuning
    • StressBench (Experimental)
    • Upgrading
  • Solutions
  • Client APIs
    • Java API
    • S3 API
    • REST API
    • POSIX API
  • Contributor Resources
    • Building Alluxio From Source
    • Contribution Guide
    • Code Conventions
    • Documentation Conventions
    • Contributor Tools
  • Reference
    • List Of Configuration Properties
    • List of Metrics
  • REST API
    • Master REST API
    • Worker REST API
    • Proxy REST API
    • Job REST API
  • Javadoc
Powered by GitBook
On this page
  • Prerequisites
  • Basic Setup
  • Install Alluxio client jar to Presto Iceberg connector
  • Configure Presto to use the Iceberg connector
  • Examples: Use Presto to Query Iceberg Tables on Alluxio
  • Create a schema and an Iceberg table
  • Insert sample data into the table
  1. Compute Integration

Presto on Iceberg (Experimental)

Last updated 6 months ago

Presto has introduced support for in version 0.256.

This document describes how to use Presto to query Iceberg tables through Alluxio. This document is currently experimental, and the information provided here is subject to change.

In order to use Presto to query an Iceberg table, make sure you have a working setup of Presto, Hive Metastore and Alluxio, and Presto can access data through Alluxio's filesystem interface. If not, please refer to the on general Presto installation and configuration. Most of that guide apply for Iceberg workflows as well, and this document covers the specific instructions for working with Iceberg tables.

Prerequisites

  • All from the general Presto setup;

  • Presto server, version 0.257 or later.

Basic Setup

Install Alluxio client jar to Presto Iceberg connector

Copy the Alluxio client jar located at {{site.ALLUXIO_CLIENT_JAR_PATH}} into Presto Iceberg connector's directory located at ${PRESTO_HOME}/plugin/iceberg/. Then restart the Presto server:

$ ${PRESTO_HOME}/bin/launcher restart

Also note that the same client jar file needs to be on Hive's classpath. If not, please refer to the on setting up Hive to work with Alluxio.

Configure Presto to use the Iceberg connector

Presto reads and writes an Iceberg table using the . To enable the Iceberg connector, create a catalog for Iceberg connector in Presto's installation directory as ${PRESTO_HOME}/etc/catalog/iceberg.properties:

connector.name=iceberg
hive.metastore.uri=thrift://localhost:9083

Change the Hive Metastore connection URI to match your setup.

Examples: Use Presto to Query Iceberg Tables on Alluxio

Create a schema and an Iceberg table

For demonstration purposes, we will create an example schema and an Iceberg table.

Launch the Presto CLI client with the following command:

./presto --server localhost:8080 --catalog iceberg --debug

Run the following statements from the client:

CREATE SCHEMA iceberg_test;
USE iceberg_test;
CREATE TABLE person (name varchar, age int, id int)
    WITH (location = 'alluxio://localhost:19998/person', format = 'parquet');

Change the hostname and port in the Alluxio connection URI to match your setup.

These statements create a schema iceberg_test and a table person at the directory /person in Alluxio filesystem, and with Parquet as the table's storage format.

Insert sample data into the table

Insert one row of sample data into the newly created table:

INSERT INTO person VALUES ('alice', 18, 1000);

Now you can verify things are working by reading back the data from the table:

SELECT * FROM person;

As well as examine the files in Alluxio:

$ bin/alluxio fs ls /person
drwxr-xr-x  alluxio    alluxio    10    PERSISTED 06-29-2021 16:24:02:007  DIR /person/metadata
drwxr-xr-x  alluxio    alluxio     1    PERSISTED 06-29-2021 16:24:00:049  DIR /person/data
$ bin/alluxio fs ls /person/data
-rw-r--r--  alluxio    alluxio   400    PERSISTED 06-29-2021 16:24:00:691 100% /person/data/6e6a451a-8f20-4d73-9ef6-ee48070dad27.parquet
$ bin/alluxio fs ls /person/metadata
-rw-r--r--  alluxio    alluxio  1406    PERSISTED 06-29-2021 16:23:28:608 100% /person/metadata/00000-2fd982ae-2a81-44a8-a4db-505e9ba6c09d.metadata.json
...
(snip)

You can see the metadata and data files of the Iceberg table have been created.

For more information on the client, please refer to this section on . Note that the catalog is set to iceberg since we will be dealing with Iceberg tables.

Note: there was a bug in the write path of Presto's Iceberg connector, so insertion may fail. This issue has been resolved in Presto version 0.257 by .

this PR
Iceberg tables
guide
Iceberg connector
section
prerequisites
querying tables using Presto