Alluxio
ProductsLanguageHome
DA-3.5 (stable)
DA-3.5 (stable)
  • Overview
  • Getting Started with K8s
    • Resource Prerequisites and Compatibility
    • Install on Kubernetes
    • Monitoring and Metrics
    • Cluster Administration
    • System Health Check & Quick Recovery
    • Collecting Cluster Information
  • Architecture
    • Alluxio Namespace and Under File System Namespaces
    • I/O Resiliency
    • Worker Management and Consistent Hashing
  • Storage Integrations
    • Amazon AWS S3
    • HDFS
    • Tencent COS
  • Compute Integrations
    • Trino on K8s
    • Spark on K8s
    • Data Lake Connectors
  • Client APIs
    • S3 API
    • Java HDFS-compatible API
  • Caching Operations
    • Cache Preloading
    • Cache Filter Policy
    • Cache Eviction
      • TTL Policy
      • Priority Policy
      • Free CLI Command
  • Resource Management
    • Directory-Based Cluster Quota
    • UFS Bandwidth Limiting
  • Performance Optimizations
    • Read Throughput Via Replicas
    • Reading Large Files
    • Metadata Listing
    • Data Prefetch
  • Security
    • TLS Support
    • Apache Ranger Integration
  • Reference
    • User CLI
    • Metrics
    • S3 API Usage
    • Third Party Licenses
  • Release Notes
Powered by GitBook
On this page
  • Known limitations
  • Iceberg
  1. Compute Integrations

Data Lake Connectors

Last updated 1 month ago

Data lake connectors enable compute engines such as Trino and Spark to query data as structured tables.

The supported connectors include:

The instructions to configure each of these connectors are described in their respective compute engine documentation.

Known limitations

Iceberg

Due to the nature of how Iceberg handles metadata through files, it is highly recommended to avoid caching the corresponding metadata files. If metadata files end up being persisted to cache, subsequent errors and/or warnings may occur when accessing the related files.

After determining the locations of the metadata files, set the paths as skipCache via the .

Caching data when writing to HDFS

When writing data with HDFS as the UFS, data is not cached upon writing, even in the case that the write type is configured to persist the data to cache. Only during a cold read of the newly written data will it be persisted in Alluxio cache. Note that this behavior was observed using Trino connecting to HDFS, but it was not observed when using Trino connecting to S3.

Apache Hive
Apache Iceberg
Delta Lake
Trino
cache filter feature