在 K8s 上运行 Trino

Trino 是一个开源的分布式SQL查询引擎，用于进行大规模的数据交互式分析查询。本指南介绍了如何将Alluxio作为分布式缓存层，通过Trino 对Alluxio所支持的任何数据存储系统，例如AWS S3和HDFS，进行查询。部署Alluxio后，Trino 能够访问来自任何数据源的数据，并能透明地将频繁访问的数据（例如，常用表）缓存到Alluxio分布式存储中。

先决条件

本指南假定 Alluxio集群已部署在Kubernetes上。

此外，还需使用docker来构建自定义的 Trino 镜像。

准备镜像

要与 Trino 集成，须在 Trino 镜像中添加 Alluxio 的 jar 文件和配置文件。 Trino 镜像修改后需启动 Trino 容器才能连接到 Alluxio 集群。

在Alluxio安装说明中列出的下载文件中，找到名为alluxio-enterprise-DA-3.5-10.2.0-release.tar.gz的压缩包。从压缩包解压以下 Alluxio jar文件：

client/alluxio-DA-3.5-10.2.0-client.jar
client/ufs/alluxio-underfs-s3a-shaded-DA-3.5-10.2.0.jar （如果使用 S3 存储桶作为 UFS）
client/ufs/alluxio-underfs-hadoop-3.3-shaded-DA-3.5-10.2.0.jar （如果使用 HDFS 作为 UFS）

准备一个空目录作为构建镜像的工作目录。在此目录中，创建目录files/alluxio/，并将前面提到的jar文件复制到其中。下载commons-lang3 jar到files/目录中，使其也包含在镜像中。

wget https://repo1.maven.org/maven2/org/apache/commons/commons-lang3/3.14.0/commons-lang3-3.14.0.jar -O files/commons-lang3-3.14.0.jar

创建一个Dockerfile 来修改基础 Trino 镜像。定义参数示例如下：

TRINO_VERSION=449 作为 Trino 的版本
UFS_JAR=files/alluxio/alluxio-underfs-s3a-shaded-DA-3.5-10.2.0.jar 作为复制到 files/alluxio/ 的 UFS jar 的路径
CLIENT_JAR=files/alluxio/alluxio-DA-3.5-10.2.0-client.jar 作为复制到 files/alluxio/ 的 Alluxio 客户端 jar 的路径
COMMONS_LANG_JAR=files/commons-lang3-3.14.0.jar作为下载到 files/ 的 commons-lang3 jar 的路径

ARG  TRINO_VERSION=449
ARG  IMAGE=trinodb/trino:${TRINO_VERSION}
FROM $IMAGE

ARG  UFS_JAR=files/alluxio/alluxio-underfs-s3a-shaded-DA-3.5-10.2.0.jar
ARG  CLIENT_JAR=files/alluxio/alluxio-DA-3.5-10.2.0-client.jar
ARG  COMMONS_LANG_JAR=files/commons-lang3-3.14.0.jar

# 删除当前的 Alluxio jar 文件
RUN  rm /usr/lib/trino/plugin/hive/alluxio*.jar \
     && rm /usr/lib/trino/plugin/delta-lake/alluxio*.jar \
     && rm /usr/lib/trino/plugin/iceberg/alluxio*.jar

# 将 jar 文件复制到镜像中
ENV  ALLUXIO_HOME=/usr/lib/trino/alluxio
RUN  mkdir -p $ALLUXIO_HOME/lib && mkdir -p $ALLUXIO_HOME/conf
COPY --chown=trino:trino $UFS_JAR          /usr/lib/trino/alluxio/lib/
COPY --chown=trino:trino $CLIENT_JAR       /usr/lib/trino/alluxio/lib/
COPY --chown=trino:trino $COMMONS_LANG_JAR /usr/lib/trino/alluxio/lib/

# 将 UFS jar 链接到 lib/ 目录，并将客户端和 commons-lang3 jar 链接到每个插件目录 
RUN  ln -s $ALLUXIO_HOME/lib/alluxio-underfs-*.jar /usr/lib/trino/lib/
RUN  ln -s $ALLUXIO_HOME/lib/alluxio-*-client.jar    /usr/lib/trino/plugin/hive/hdfs/ \
     && ln -s $ALLUXIO_HOME/lib/alluxio-*-client.jar /usr/lib/trino/plugin/delta-lake/hdfs/ \
     && ln -s $ALLUXIO_HOME/lib/alluxio-*-client.jar /usr/lib/trino/plugin/iceberg/hdfs/
RUN  ln -s $ALLUXIO_HOME/lib/commons-lang3-3.14.0.jar     /usr/lib/trino/plugin/hive/hdfs/ \
     && ln -s $ALLUXIO_HOME/lib/commons-lang3-3.14.0.jar  /usr/lib/trino/plugin/delta-lake/hdfs/ \
     && ln -s $ALLUXIO_HOME/lib/commons-lang3-3.14.0.jar  /usr/lib/trino/plugin/iceberg/hdfs/

USER trino:trino

上述操作会把必要的 jar 文件复制到镜像中。对于UFS jar，在 Trino 的 lib/目录中会创建一个指向它的symlink。对于 Alluxio 客户端和commons-lang3 jars，将遍历每个可能的插件目录，并创建指向它们的symlink。

在上述示例中，jar 文件被复制到Hive、Delta Lake和Iceberg的插件目录中，但根据使用的 connector 不同，可能只需要其中的部分文件。
需注意的是，对于早于434的 Trino 版本而言，复制CLIENT_JAR和COMMONS_LANG_JAR的目标目录应为 /usr/lib/trino/plugin/<PLUGIN_NAME>/ ，而不是 /usr/lib/trino/plugin/<PLUGIN_NAME>/hdfs/。

运行以下命令来构建镜像，并将 <PRIVATE_REGISTRY> 替换为您的私有容器仓库的URL，将 <TRINO_VERSION> 替换为相应的 Trino 版本。

$ docker build --platform linux/amd64 -t <PRIVATE_REGISTRY>/alluxio/trino:<TRINO_VERSION> .

通过运行以下命令来上传镜像：

$ docker push <PRIVATE_REGISTRY>/alluxio/trino:<TRINO_VERSION>

部署 Hive Metastore

要完成 Trino的端到端示例，需要启动一个Hive Metastore。如果已有可用的Hive Metastore，则可跳过该步骤。配置Trino catalog 需要 Hive Metastore 的URI。

使用helm创建一个独立的Hive Metastore。

运行以下命令添加helm仓库：

$ helm repo add heva-helm-charts https://hevaweb.github.io/heva-helm-charts/

运行以下命令安装Hive集群：

$ helm install hive-metastore heva-helm-charts/hive-metastore

查看 Hive pods 的状态:

$ kubectl get pods

即使hive-metastore-db-init-schema-* Pods的状态会出现Error，也不会影响其余的工作流程。

完成后，通过运行以下命令删除Hive集群：

$ helm uninstall hive-metastore

为 S3 配置 AWS 凭证

如果使用S3，通过编辑 configmap 配置AWS凭证：

$ kubectl edit configmap hive-metastore

在编辑器中，搜索hive-site.xml属性fs.s3a.access.key和fs.s3a.secret.key，输入 AWS凭证来访问S3存储桶。

删除Hive Metastore pod 可使其重启：

$ kubectl delete pod hive-metastore-0

Hive Metastore pod 将自动重启。

配置 Trino

在安装目录中创建一个 trino-alluxio.yaml 配置文件，专门用于Alluxio的配置。我们会把这个配置文件分解成几个部分，然后合并成一个单独的yaml文件。

配置镜像和服务器

image:
  registry: <PRIVATE_REGISTRY>
  repository: <PRIVATE_REPOSITORY>
  tag: <TRINO_VERSION>
server:
  workers: 2
  log:
    trino:
      level: INFO
  config:
    query:
      maxMemory: "4GB"

指定之前构建并上传的自定义 Trino 镜像的位置，并根据需要自定义 Trino 服务器的配置。

配置 additionalCatalogs

additionalCatalogs:
  memory: |
    connector.name=memory
    memory.max-data-per-node=128MB

除了内存相关配置外，这一部分还应添加 catalog 配置。根据所使用的 connector 不同，在 additionalCatalogs 下添加相应的 catalog 配置。如果按照前一章节所述启动 Hive Metastore，hive.metastore.uri 的值应为 thrift://hive-metastore.default:9083。

Hive

  # 使用 alluxio 的 connector
  hive: |
    connector.name=hive
    fs.hadoop.enabled=true
    # use the uri of your hive-metastore deployment
    hive.metastore.uri=thrift://<HIVE_METASTORE_URI>
    hive.config.resources=/etc/trino/alluxio-core-site.xml
    hive.non-managed-table-writes-enabled=true
    hive.s3-file-system-type=HADOOP_DEFAULT

如果使用 Trino 458 或更高版本，则需要 fs.hadoop.enabled=true。

Delta Lake

  # 使用 alluxio 的 connector
  delta_lake: |
    connector.name=delta_lake
    fs.hadoop.enabled=true
    hive.metastore.uri=thrift://<HIVE_METASTORE_URI>
    hive.config.resources=/etc/trino/alluxio-core-site.xml
    hive.s3-file-system-type=HADOOP_DEFAULT
    delta.enable-non-concurrent-writes=true
    delta.vacuum.min-retention=1h
    delta.register-table-procedure.enabled=true
    delta.extended-statistics.collect-on-write=false

如果使用 Trino 458 或更高版本，则需要 fs.hadoop.enabled=true。

Iceberg

  # 使用 alluxio 的 connector
  iceberg: |
    connector.name=iceberg
    fs.hadoop.enabled=true
    hive.metastore.uri=thrift://<HIVE_METASTORE_URI>
    hive.config.resources=/etc/trino/alluxio-core-site.xml

如果使用 Trino 458 或更高版本，则需要 fs.hadoop.enabled=true。

配置 coordinator 和 worker

coordinator 和worker的配置都需要对其 additionalJVMConfig 和 additionalConfigFiles 部分进行类似的修改。

对于coordinator:

coordinator:
  jvm:
    maxHeapSize: "8G"
    gcMethod:
      type: "UseG1GC"
      g1:
        heapRegionSize: "32M"
  config:
    memory:
      heapHeadroomPerNode: ""
    query:
      maxMemoryPerNode: "1GB"
  additionalJVMConfig:
  - "-Dalluxio.home=/usr/lib/trino/alluxio"
  - "-Dalluxio.conf.dir=/etc/trino/"
  - "-XX:+UnlockDiagnosticVMOptions"
  - "-XX:G1NumCollectionsKeepPinned=10000000"
  - "-Djava.security.manager=allow"
  additionalConfigFiles:
    alluxio-core-site.xml: |
      <configuration>
        <property>
          <name>fs.s3a.impl</name>
          <value>alluxio.hadoop.FileSystem</value>
        </property>
        <property>
          <name>fs.hdfs.impl</name>
          <value>alluxio.hadoop.FileSystem</value>
        </property>
      </configuration>
    alluxio-site.properties: |
      alluxio.etcd.endpoints: http://<ETCD_POD_NAME>:2379
      alluxio.cluster.name: default-alluxio
      alluxio.k8s.env.deployment: true
      alluxio.mount.table.source: ETCD
      alluxio.worker.membership.manager.type: ETCD
    metrics.properties: |
      # Enable the Alluxio Jmx sink
      sink.jmx.class=alluxio.metrics.sink.JmxSink

如果使用 Trino 464 或更高版本，则需要 "-Djava.security.manager=allow" 才能与 Java 23 兼容。

注意：需要添加 2 个额外的 JVM 配置项，用于定义 alluxio.home 和 alluxio.conf.dir 属性。

在 additionalConfigFiles 下，定义了 3 个 Alluxio 配置文件。

alluxio-core-site.xml：该 XML 文件定义了应为特定的文件 URI scheme 使用哪个底层文件系统类。通过设置 alluxio.hadoop.FileSystem值，确保文件通过 Alluxio 文件系统执行。根据底层文件系统（UFS），设置相应的配置：
- 如果是 S3, 设置 fs.s3a.impl to alluxio.hadoop.FileSystem
- 如果是 HDFS, 设置 fs.hdfs.impl to alluxio.hadoop.FileSystem
alluxio-site.properties: 该属性文件定义了 Alluxio 客户端的具体配置属性。下一章节将描述如何填写此文件。
metrics.properties: 添加这一行可从 Trino 抓取 Alluxio 的专属指标。

对于 worker，同样添加 2 个额外的JVM 配置，并在 additionalConfigFiles 下同样复制3 个 Alluxio 配置文件。

worker:
  jvm:
    maxHeapSize: "8G"
    gcMethod:
      type: "UseG1GC"
      g1:
        heapRegionSize: "32M"
  config:
    memory:
      heapHeadroomPerNode: ""
    query:
      maxMemoryPerNode: "1GB"
  additionalJVMConfig:
  - "-Dalluxio.home=/usr/lib/trino/alluxio"
  - "-Dalluxio.conf.dir=/etc/trino/"
  - "-XX:+UnlockDiagnosticVMOptions"
  - "-XX:G1NumCollectionsKeepPinned=10000000"
  - "-Djava.security.manager=allow"
  additionalConfigFiles:
    alluxio-core-site.xml: |
      <configuration>
        <property>
          <name>fs.s3a.impl</name>
          <value>alluxio.hadoop.FileSystem</value>
        </property>
        <property>
          <name>fs.hdfs.impl</name>
          <value>alluxio.hadoop.FileSystem</value>
        </property>
      </configuration>
    alluxio-site.properties: |
      alluxio.etcd.endpoints: http://<ETCD_POD_NAME>:2379
      alluxio.cluster.name: default-alluxio
      alluxio.k8s.env.deployment: true
      alluxio.mount.table.source: ETCD
      alluxio.worker.membership.manager.type: ETCD
    metrics.properties: |
      sink.jmx.class=alluxio.metrics.sink.JmxSink

如果使用 Trino 464 或更高版本，则需要 "-Djava.security.manager=allow" 才能与 Java 23 兼容。

alluxio-site.properties

要使 Trino 集群与 Alluxio 集群正常通信，须在 Alluxio 客户端和 Alluxio 服务器之间对齐某些属性。

要显示来自 Alluxio 集群配置映射的 alluxio-site.properties，运行：

$ kubectl get configmap alluxio-alluxio-conf -o yaml

其中 alluxio-alluxio-conf 是 Alluxio 集群的 ConfigMap 名称。

在此输出中，搜索 alluxio-site.properties 下的以下属性，并为coordinator 和 worker 的 alluxio-site.properties 设置这些属性：

alluxio.etcd.endpoints: Alluxio 集群的 ETCD 主机。假设从 Trino 集群可以访问这些主机名可，则可以在 Trino 的 alluxio-site.properties.alluxio.etcd.endpoints 文件中使用相同的值。
alluxio.cluster.name: 必须设置为与 Trino 集群完全相同的值。
alluxio.k8s.env.deployment=true
alluxio.mount.table.source=ETCD
alluxio.worker.membership.manager.type=ETCD

如果按照在 Kubernetes 上安装 Alluxio 的说明，值应该是：

    alluxio-site.properties: |
      alluxio.etcd.endpoints: http://alluxio-etcd.default:2379
      alluxio.cluster.name: default-alluxio
      alluxio.k8s.env.deployment: true
      alluxio.mount.table.source: ETCD
      alluxio.worker.membership.manager.type: ETCD

启动 Trino

运行以下命令，将 Helm Chart 添加到本地仓库：

$ helm repo add trino https://trinodb.github.io/charts

运行以下命令来部署 Trino:

$ helm install -f trino-alluxio.yaml trino-cluster trino/trino

检查 Trino pod 的状态，并确定 Trino coordinator 的 pod 名称：

$ kubectl get pods

完成后，通过运行以下命令删除 Trino 集群：

$ helm uninstall trino-cluster

执行查询

运行以下命令，进入 Trino coordinator pod：

$ kubectl exec -it $POD_NAME -- /bin/bash

其中 $POD_NAME 是 Trino coordinator 的完整 pod 名称，以 trino-cluster-trino-coordinator- 开头。

在 Trino pod 内部，运行：

$ trino

要在 UFS 挂载中创建一个简单的 schema，运行：

trino> CREATE SCHEMA hive.example WITH (location = '<UFS_URI>');

其中 UFS_URI 是一个已定义的 Alluxio 挂载内的路径。以 S3 存储桶为例，该路径可能是s3a://myBucket/myPrefix/。

Last updated 7 months ago