Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fileio] Add file-io.allow-cache for RESTTokenFileIO #5054

Merged
merged 6 commits into from
Feb 12, 2025

Conversation

JingsongLi
Copy link
Contributor

Purpose

  1. Add file-io.allow-cache for RESTTokenFileIO, file io should be closed in RESTTokenFileIO, so it can not be cached.
  2. Improve HadoopCompliantFileIO, it should use Map.computeIfAbsent.
  3. We should use new Configuration(false) to not load files in configuration for improving performance.

Tests

API and Format

Documentation

<td><h5>resolving-file-io.enabled</h5></td>
<td style="word-wrap: break-word;">false</td>
<td>Boolean</td>
<td>Whether to enable resolving fileio, when this option is enabled, in conjunction with the table's property data-file.external-paths, Paimon can read and write to external storage paths, such as OSS or S3. In order to access these external paths correctly, you also need to configure the corresponding access key and secret key.</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fileio -> file io same with line 87?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is ok in description, we can just be careful for option key, it is API part.


@Override
public void close() {
if (!allowCache) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means, if allowCache, the fileIo need to close by outside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not allow cache, the fileio should be closed.

If allow cache, the fileio is in the static cache, we cannot close them.

<td><h5>file-io.allow-cache</h5></td>
<td style="word-wrap: break-word;">true</td>
<td>Boolean</td>
<td>Whether to allow static cache in file io implementation. If not allowed, this means that there may be a large number of FileIO instances generated, enabling caching can lead to resource leakage.</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question here:
If enabling caching, the number of FileIO instances will be reduce. Why enabling caching can lead to resource leakage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our original design was to place it in the cache of static variables, but for the case of REST Catalog, it may have a lot of FileIO generated, and there may be a FileIO for each table because each table has different file access permissions.

So if there are too many FileIOs in this situation, we cannot cache them in memory casually. If there are too many, it will lead to too many resources, that is, resource leakage.

@jerry-024
Copy link
Contributor

LGTM

@JingsongLi JingsongLi merged commit 155e51c into apache:master Feb 12, 2025
13 checks passed
JackeyLee007 pushed a commit to JackeyLee007/paimon that referenced this pull request Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants