Add ability to read CSV without header row #82

djalova · 2020-12-15T22:20:35Z

Checklist:

Does this pull request close an issue? We encourage you to open an issue first if this pull request (PR) is not a
minor change.
- No
  - The change in this pull request is minor.
- Yes
  - Close CSV loader should support those without headers #54

For the following questions, only check the boxes that are applicable.

djalova · 2020-12-15T22:21:13Z

Recreating PR due to #81

xuhdev · 2020-12-15T22:32:58Z

pydax/loaders/_table.py

@@ -37,6 +37,8 @@ def load(self, path: Union[_typing.PathLike, Dict[str, str]], options: SchemaDic
               - ``columns`` key specifies the data type of each column. Each data type corresponds to a Pandas'
                 supported dtype. If unspecified, then it is default.
               - ``delimiter`` key specifies the delimiter of the input CSV file.
+               - ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True


Suggested change

- ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True

- ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True.

xuhdev · 2020-12-15T22:35:02Z

tests/test_loaders.py

+        noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False
+        with pytest.raises(ValueError) as exinfo:  # Pandas should error from trying to read string as another dtype
+            Dataset(noaa_jfk_schema, tmp_path, mode=Dataset.InitializationMode.DOWNLOAD_AND_LOAD)
+            assert('could not convert string to float' in exinfo.value)


The exception is raised in the previous line and this assertion should have never been executed:

Suggested change

assert('could not convert string to float' in exinfo.value)

assert 'could not convert string to float' in str(exinfo.value)

Whoops thanks for catching

xuhdev · 2020-12-15T22:37:50Z

pydax/loaders/_table.py

@@ -55,9 +57,18 @@ def load(self, path: Union[_typing.PathLike, Dict[str, str]], options: SchemaDic
            else:
                dtypes[column] = type_

+        names = None
+        header = None
+        if options.get('header', True):


Based on the document, do you mean

Suggested change

if options.get('header', True):

if options.get('header', True) is not False:

Or you can actually make the function based on Python's evaluation of whether the value is true or false rather than deciding whether it is exactly False or not.

I think it might be confusing if we make the function based on Python's evaluation. If we accidentally set header to ''. Maybe we could rename the key to no_header and then use Python's evaluation?

Sounds good.

xuhdev · 2020-12-15T22:38:28Z

tests/test_loaders.py

+        noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = True
+        self.test_csv_pandas_loader(tmp_path, noaa_jfk_schema)
+
+        noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['header'] = False


Perhaps also add a test on the value being an empty string and None?

xuhdev · 2020-12-16T00:52:00Z

tests/test_loaders.py

+        noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['no_header'] = False
+        self.test_csv_pandas_loader(tmp_path, noaa_jfk_schema)
+
+        noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['no_header'] = ''
+        self.test_csv_pandas_loader(tmp_path, noaa_jfk_schema)
+
+        noaa_jfk_schema['subdatasets']['jfk_weather_cleaned']['format']['options']['no_header'] = None
+        self.test_csv_pandas_loader(tmp_path, noaa_jfk_schema)


How about using a for loop for these. Everything else LGTM now

xuhdev

LGTM

djalova added 4 commits December 15, 2020 10:29

Add ability to read CSV without headers

5cd8f4d

Remove print

7c6b5eb

Fix lint

a46e8a2

Address review comments

e837843

djalova requested review from xuhdev and edwardleardi December 15, 2020 22:21

xuhdev reviewed Dec 15, 2020

View reviewed changes

xuhdev mentioned this pull request Dec 15, 2020

Add ability to read CSV without header row #77

Closed

20 tasks

djalova added 3 commits December 15, 2020 14:58

Address PR comments

a0bc696

Change key to no_header

df36872

Merge branch 'master' into header

38eba37

xuhdev reviewed Dec 16, 2020

View reviewed changes

djalova added 2 commits December 15, 2020 17:10

Clean up test cases

6715570

Merge branch 'header' of https://github.com/CODAIT/pydax into header

afd75b8

xuhdev approved these changes Dec 16, 2020

View reviewed changes

xuhdev merged commit 9728b9f into master Dec 16, 2020

xuhdev deleted the header branch December 16, 2020 01:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to read CSV without header row #82

Add ability to read CSV without header row #82

djalova commented Dec 15, 2020

djalova commented Dec 15, 2020

xuhdev Dec 15, 2020

xuhdev Dec 15, 2020

djalova Dec 15, 2020

xuhdev Dec 15, 2020

djalova Dec 15, 2020

xuhdev Dec 15, 2020

xuhdev Dec 15, 2020

xuhdev Dec 16, 2020 •

edited

Loading

xuhdev left a comment

	- ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True
	- ``header`` key specifies if the first row of the CSV file contains the headers. Defaults to True.

	assert('could not convert string to float' in exinfo.value)
	assert 'could not convert string to float' in str(exinfo.value)

	if options.get('header', True):
	if options.get('header', True) is not False:

Add ability to read CSV without header row #82

Add ability to read CSV without header row #82

Conversation

djalova commented Dec 15, 2020

Checklist:

djalova commented Dec 15, 2020

xuhdev Dec 15, 2020

Choose a reason for hiding this comment

xuhdev Dec 15, 2020

Choose a reason for hiding this comment

djalova Dec 15, 2020

Choose a reason for hiding this comment

xuhdev Dec 15, 2020

Choose a reason for hiding this comment

djalova Dec 15, 2020

Choose a reason for hiding this comment

xuhdev Dec 15, 2020

Choose a reason for hiding this comment

xuhdev Dec 15, 2020

Choose a reason for hiding this comment

xuhdev Dec 16, 2020 • edited Loading

Choose a reason for hiding this comment

xuhdev left a comment

Choose a reason for hiding this comment

xuhdev Dec 16, 2020 •

edited

Loading