-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to read CSV without header row #82
Conversation
Recreating PR due to #81 |
pydax/loaders/_table.py
Outdated
@@ -37,6 +37,8 @@ def load(self, path: Union[_typing.PathLike, Dict[str, str]], options: SchemaDic | |||
- ``columns`` key specifies the data type of each column. Each data type corresponds to a Pandas' | |||
supported dtype. If unspecified, then it is default. | |||
- ``delimiter`` key specifies the delimiter of the input CSV file. | |||
- ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True | |
- ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True. |
tests/test_loaders.py
Outdated
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False | ||
with pytest.raises(ValueError) as exinfo: # Pandas should error from trying to read string as another dtype | ||
Dataset(noaa_jfk_schema, tmp_path, mode=Dataset.InitializationMode.DOWNLOAD_AND_LOAD) | ||
assert('could not convert string to float' in exinfo.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception is raised in the previous line and this assertion should have never been executed:
assert('could not convert string to float' in exinfo.value) | |
assert 'could not convert string to float' in str(exinfo.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops thanks for catching
pydax/loaders/_table.py
Outdated
@@ -55,9 +57,18 @@ def load(self, path: Union[_typing.PathLike, Dict[str, str]], options: SchemaDic | |||
else: | |||
dtypes[column] = type_ | |||
|
|||
names = None | |||
header = None | |||
if options.get('header', True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the document, do you mean
if options.get('header', True): | |
if options.get('header', True) is not False: |
Or you can actually make the function based on Python's evaluation of whether the value is true or false rather than deciding whether it is exactly False
or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be confusing if we make the function based on Python's evaluation. If we accidentally set header
to ''
. Maybe we could rename the key to no_header
and then use Python's evaluation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
tests/test_loaders.py
Outdated
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = True | ||
self.test_csv_pandas_loader(tmp_path, noaa_jfk_schema) | ||
|
||
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also add a test on the value being an empty string and None
?
tests/test_loaders.py
Outdated
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['no_header'] = False | ||
self.test_csv_pandas_loader(tmp_path, noaa_jfk_schema) | ||
|
||
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['no_header'] = '' | ||
self.test_csv_pandas_loader(tmp_path, noaa_jfk_schema) | ||
|
||
noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['no_header'] = None | ||
self.test_csv_pandas_loader(tmp_path, noaa_jfk_schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using a for loop for these. Everything else LGTM now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Checklist:
minor change.
For the following questions, only check the boxes that are applicable.
Ensure that the pull request text above the horizontal line is descriptive.
Add/update relevant tests.
Add/update relevant documents.
Do you have triage permission of this repository?