# 数据湖连接器

数据湖连接器 (connector) 使得 Trino 和 Spark 等计算引擎能够以结构化表格的形式查询数据。

Alluxio 支持的连接器包括：

* [Apache Hive](https://hive.apache.org/)
* [Apache Iceberg](https://iceberg.apache.org/)
* [Delta Lake](https://delta.io/)

配置各连接器的说明见各自的计算引擎文档。

* [Trino](https://documentation.alluxio.io/ee-da-cn/compute/pages/xIUhVARhNbl4mIf9g9ml#配置-additionalCatalogs)

## 已知限制

### Iceberg

考虑到 Iceberg 通过文件管理元数据的特性，强烈建议避免缓存相应的元数据文件。如果元数据文件被持久化到缓存中，则在访问相关文件时可能会出现错误和/或警告。

确定元数据文件的位置后，通过[缓存过滤功能](/ee-da-cn/cache/cache-filter-policy.md)将这些路径设置为 `skipCache`。

#### 写入 HDFS 时缓存数据

当使用 HDFS 作为 UFS 写入数据时，即使将写入类型配置为将数据持久化到缓存，数据在写入时也不会被缓存。只有在对新写入的数据进行冷读取时，数据才会持久化到 Alluxio 缓存中。请注意，使用 Trino 连接到 HDFS 时可观察到这种行为，但使用 Trino 连接到 S3 时则观察不到。


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.alluxio.io/ee-da-cn/compute/data-lake-connectors.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
