Release Notes
Alluxio Enterprise AI 3.7
What is New
Transparent Distributed S3 Cache with Sub-ms Latency
Nowadays, AI/ML workloads (e.g., PyTorch, TensorFlow) rely on Amazon S3 (or S3 compatible storage) for scalable data access but face throughput and latency challenges. Alluxio Enterprise AI bridges this gap by enabling single digit millisecond latency while maintaining high throughput data access through S3 compatible interface via deploying Alluxio cache colocated with GPUs.
Use Cases:
Model training accesses datasets through S3 interface, and requires faster performance
Model deployment loads model files through S3 interface, and requires faster performance
Model inference loads features from parquet files on AWS S3
Key Benefits
Faster AL/ML Workloads
Cache data in GPU-node NVMe, eliminating S3 fetch delays for repeated reads.
Achieve near-local NVMe throughput and latency for iterative workloads (e.g., model training).
Reduce Cloud Access Costs
Co-locating cache with GPUs slashes up to 70% egress and API calls fees.
Performance Results
Single-digit-milliseconds Latency - Alluxio offers up to 45x lower latency than standard AWS s3, up to 5x lower latency than AWS S3 Express One Zone
High throughput - Alluxio offers up to 11.5GiB/s (or 98.7 Gbps) under 100Gbps network. It is 2x higher read throughput than same region AWS S3
Performance is linearly scaling out
Alluxio transforms S3 into a high-throughput, low-latency data hub for AI – eliminating I/O waits while slashing cloud costs.


High-performance AI Data Preprocessing Through Spark
Alluxio offers a new capability to boost AI data preprocessing through Spark Streaming and ETL pipelines. By leveraging native Spark integration and its distributed caching architecture, Alluxio accelerates data processing workflows for AI/ML applications.
Use Cases:
In AI pipeline, user requires to faster preprocess datasets through Spark
Key Benefits:
Faster AI Workloads : Reduces data loading/transformation time.
Simplified Scalability : Handles petabyte-scale datasets without pipeline redesigns.
Seamless Integration : Works with existing Spark code and storage systems (HDFS, S3, etc.).
Performance Results: In the TPC-DS SF100 (100GB) benchmark, Alluxio improves query performance by up to 3× compared to direct access to AWS S3. On average, it achieves a 32% speedup across 135 queries.

This enhancement is particularly valuable for ML engineers and data teams managing feature engineering at scale while maintaining compatibility with standard Spark ecosystems.
5x Faster Cache Preloading with Partitioned and Parallel Processing for Large Files
Alluxio supports the ability to preload data from the underlying storage (UFS) to its cache. An enhancement to this functionality introduces a partitioned and parallel data loading mechanism, providing 5x faster performance when dealing with large files (typically >1GB). The new mechanism ensures faster, more efficient data transfers to Alluxio's cache.
Use Cases:
Model Training: Requires fast access to preloaded datasets to accelerate the training process and reduce data loading delays.
Model Deployment: Demands shorter cold start times by quickly loading large model files, ensuring faster inference and responsiveness.
Key Enhancements:
Partitioned Data Loading:
Large files are split into smaller, manageable chunks (partitions) for faster loading.
Partitioning ensures that each chunk can be handled independently, leading to better scalability and resource utilization.
Parallel Data Loading:
Each partition is loaded in parallel, drastically reducing the time required to load the entire file.
This parallelism maximizes available bandwidth and computational resources, leading to a performance boost.
Resource Efficiency:
The partitioned approach distributes the load evenly across available compute resources, ensuring balanced utilization of system resources.
This results in reduced bottlenecks and increased throughput.
Role-based Access Control (RBAC) S3 Access
Alluxio's new Role-based Access Control (RBAC) S3 access feature enhances data security and control. This functionality allows administrators to define granular access permissions (read/write) or integrate the existing authentication and authorization services for S3 data through Alluxio's unified namespace.
Authentication: Supports OIDC/OAuth 2.0-based authentication, such as Okta, Cognito, and Microsoft AD
Authorization: Supports Ranger
The feature bridges compliance gaps by extending enterprise-grade authentication and authorization to S3 data while maintaining Alluxio’s caching and acceleration benefits.
FUSE Non-Disruptive Upgrade
Traditional FUSE updates present significant operational challenges for production environments. When updating Linux FUSE services, administrators must restart the service, which forcibly terminates all active connections and mounted filesystems. This mandatory downtime disrupts running applications and business workflows, particularly problematic for data-intensive operations that rely on continuous access to FUSE-mounted access.
Alluxio's new FUSE non-disruptive upgrade feature fundamentally changes this paradigm. The technology enables in-place upgrades of the FUSE service while maintaining all existing connections and mount points. Applications continue operating normally throughout the update process. This advancement is particularly valuable for enterprises running 24/7 data pipelines or customer-facing applications that cannot tolerate downtime.
As the known limitations in this release, read operations (read, stat) will be maintained (hung and resumed within tens of seconds) during FUSE upgrades. Write (write, mv, delete) and list (readdir) operations will still fail.
Cluster Management Console Enhancements
Support Deploying Cluster Through Management Console
After installing the Alluxio K8s Operator, the remaining steps, the cluster deployment WebUI is available to continue the setup process through an intuitive graphical interface.The WebUI provides a user-friendly alternative to manual configuration, allowing administrators to visually manage cluster parameters, resource allocation, and deployment workflows. This feature significantly reduces deployment complexity while maintaining the flexibility of Alluxio's distributed architecture.


Enhance Job Management
In this release, Alluxio Management Console enhances job management functionality:
Add a meaningful name for a job
Support pagination on listing job history
Audit Log
Alluxio has introduced a new audit logging feature to enhance security and compliance monitoring. This functionality systematically records detailed access events, including user identities, operations performed (e.g., read/write), and timestamps. The logs enable administrators to analyze data access patterns, detect anomalies, and meet regulatory requirements.
Last updated