Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extractor: add explicit numerical columns tconvert_columns_to_category #4

Merged
merged 1 commit into from
Jun 3, 2024

Conversation

awillecke
Copy link
Member

The PlottingReaderFeather converts certain columns to categories automatically. numerical_columns should be used to specify which columns should not be converted, when reading data. If, however, a column is already categorial, it will not be converted to numerical explicitly, which does not function as expected. This commit adds the explicit conversion to float for numerical_columns .

@hagau hagau merged commit 0b1c3e1 into main Jun 3, 2024
1 check passed
@hagau
Copy link
Collaborator

hagau commented Jun 3, 2024

Nice, thank you! numerical_columns is indeed more expressive.
I'll merge this as is, since it is an improvement over the existing code, but I think I'll need to change the convert_columns_to_category in the near future.
In most cases the conversion to float will be correct, but in some cases other data types will be called for. This can be achieved with a dictionary of pairs of column names and data types to override the implicit type casting done by pandas. This can be optional and the conversion to float can be the default if a set is given instead of a dictionary.
Instead of excluded_columns it's probably better to explicitly name the categorical columns, since constructing the set for the heuristic used right now is somewhat costly with regard to memory and time for larger DataFrames.

@awillecke
Copy link
Member Author

Yeah, that is a good idea to provide a data type dictionary explicitly when extracting data 👍 I created a draft that I'm currently working on. Maybe it is a good place for integrating and testing.

@hagau
Copy link
Collaborator

hagau commented Jun 3, 2024

@awillecke Yes, good idea. I'll use that as a basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants