Virtual Path Mapping

Alluxio's Virtual Path Mapping feature allows you to dynamically redirect requests from one path to another. When a user or application tries to access a "source" path, Alluxio transparently performs the operation on a different "destination" path.

This feature is currently available for Alluxio FUSE and the Alluxio CLI.

How It Works

Virtual Path Mapping uses regular expressions to define mapping rules. Each rule consists of:

  • A source (src): A regular expression that matches the original path requested by the client.

  • A destination (dst): A template for the new path where the operation will be redirected.

When a path is requested, Alluxio checks it against the list of mapping rules. The first rule that matches is applied, and the operation is forwarded to the destination path. If no rules match, the original path is used.

Enabling Virtual Path Mapping

Configuration is a two-step process: enabling the feature globally and then defining the specific mapping rules.

Step 1: Enable the Feature

In your alluxio-site.properties file (or via Helm chart values), add the following setting:

alluxio.user.virtual.path.mapping.enabled=true

Step 2: Define Mapping Rules

The mapping rules are configured dynamically via the Alluxio coordinator's REST API. You send a PUT request containing your rules as a JSON object.

Example curl command:

curl -sS 'http://<coordinator-host>:19999/api/v1/conf' -X PUT -H 'Content-Type: application/json' --data '{"key":"VirtualPathMappingEntity","conf":"{\"mappingRules\":{\"rules\":[{\"src\":\"^/test/a/(.*)$\",\"dst\":\"/test/b/{{ var1 }}\"}],\"virtualDirectories\":[\"/test/a\"]}}"}'

The --data payload contains a JSON object where the conf value is an escaped JSON string defining the rules.

Understanding the Rules

The conf value is an escaped JSON string that defines your mapping rules. For clarity, here is an unescaped example that maps any path under /a/b/c/ to a new location under /x/y/z/:

{
  "mappingRules": {
    "rules": [
      {
        "src": "^/a/b/c/(.*)$",
        "dst": "/x/y/z/{{ var1 }}"
      }
    ],
    "virtualDirectories": []
  }
}
  • src: The regular expression ^/a/b/c/(.*)$ matches the incoming path. The (.*) is a capture group that saves any characters that follow /a/b/c/.

  • dst: The {{ var1 }} placeholder inserts the value from the first capture group into the destination path.

As a result, a request for /a/b/c/file.txt is transparently redirected to /x/y/z/file.txt.

Complex Example

You can define multiple rules, which are evaluated from top to bottom. The first one that matches is used.

This example maps paths based on the first character after /foo/bar/:

{
  "mappingRules": {
    "rules": [
      {
        "src": "^/foo/bar/([a-b].*)$",
        "dst": "/foo1/bar1/A/{{ var1 }}"
      },
      {
        "src": "^/foo/bar/([e-g].*)$",
        "dst": "/foo1/bar1/B/{{ var1 }}"
      },
      {
        "src": "^/foo/bar/([\\d].*)$",
        "dst": "/foo1/bar1/C/{{ var1 }}"
      }
    ],
    "virtualDirectories": []
  }
}

Note: The backslash in \\d must be escaped (\\) in the JSON string.

Validating Your Rules

Before deploying your rules, it is strongly recommended to test them using the provided command-line tool.

$ bin/alluxio fs pathmapping /foo/bar/a.txt

-----------------------------------------------------------------
Input:  /foo/bar/a.txt
Output: /foo1/bar1/A/a.txt
Virtual Directory: false
-----------------------------------------------------------------

The Output field shows the resulting path after the mapping is applied.

Handling Parent Paths with Virtual Directories

A common issue when using path mapping with FUSE is that FUSE checks for the existence of a file's parent directories before accessing the file. If the original parent path doesn't exist, FUSE will return an error, even if the destination path is valid.

For example, with the rule ^/a/b/(.*)$ -> /x/y/{{ var1 }}, accessing /a/b/file.txt via FUSE will fail if the directory /a/b does not actually exist in Alluxio.

To solve this, you can define virtual directories. These are paths that Alluxio will always treat as existing directories, preventing FUSE from throwing an error.

{
  "mappingRules": {
    "rules": [
      {
        "src": "^/a/b/(.*)$",
        "dst": "/x/y/{{ var1 }}"
      }
    ],
    "virtualDirectories": [
      "/a",
      "/a/b"
    ]
  }
}

You can check if a path is considered a virtual directory with the validation tool:

$ bin/alluxio fs pathmapping /a
-----------------------------------------------------------------
Input:  /a
Output: null
Virtual Directory: true
-----------------------------------------------------------------

Limitations

  • Performance: Using a large number of complex regex rules can impact performance, as each path request needs to be evaluated against them.

  • Listing Directories (ls): When you list a parent directory (e.g., ls /foo/bar), the output will not show the mapped destination directories (e.g., /foo1/bar1/A, /foo1/bar1/B). However, you can still access the files within them directly.

Last updated