Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow to override Base(Large)RepeatedValueViewVector #488

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rluvaton
Copy link

@rluvaton rluvaton commented Jan 7, 2025

This is done as different arrow implementation have different default list item field name

For example, Rust implementation of Arrow (arrow-rs) has default value of item (https://github.com/apache/arrow-rs/blob/4f1f6e57c568fae8233ab9da7d7c7acdaea4112a/arrow-schema/src/field.rs#L125).

And according to the specification, the name of the list is not required to be a specific string. so to make it easier to use Java Arrow implementation with other implementation of Arrow, I'm making the DATA_VECTOR_NAME configurable

I will add tests after I know this change is desired and will be approved

Another solution that is good for me, is to allow changing the field name of existing list (as sometimes I'm not the one that creates the list but all the lists are coming through me, currently it is not possible as far as I can tell.

the only place I see in the codebase that do something like that is in:

String modifiedSchema = jsonSchema.replace("$data$", "[DEFAULT]");

which convert the schema to json, change the field name and create the schema again (making sure the new field name is used)

This is done as different arrow implementation have different default list item field name

For example, Rust implementation of Arrow (`arrow-rs`) has default value of `item` (https://github.com/apache/arrow-rs/blob/4f1f6e57c568fae8233ab9da7d7c7acdaea4112a/arrow-schema/src/field.rs#L125).

And according to the specification, the name of the list is not required to be a specific string. so to make it easier to use Java Arrow implementation with other implementation of Arrow, I'm making the `DATA_VECTOR_NAME` configurable
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd make more sense to allow changing defaultDataVectorName for an instance? I'm not a big fan of mutable globals

@rluvaton
Copy link
Author

rluvaton commented Jan 7, 2025

I think it'd make more sense to allow changing defaultDataVectorName for an instance? I'm not a big fan of mutable globals

Changing defaultDataVectorName for an instance is too late, no? as the instance already created

@lidavidm
Copy link
Member

lidavidm commented Jan 7, 2025

The child vector is created lazily, though

@rluvaton
Copy link
Author

rluvaton commented Jan 7, 2025

But it can still be too late, is there a way to change a field (or field name) of existing list?

@lidavidm
Copy link
Member

lidavidm commented Jan 7, 2025

I don't see how a mutable global lets you change the name of an existing list either, though.

Once the child is created I think you'd have to recreate the child instead of being able to mutate the name.

@lidavidm
Copy link
Member

lidavidm commented Jan 7, 2025

Do you have a sample to show the scenario you are concerned about?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants