-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(1/7) RFC 79: Implement incremental upload of tab delimited data. #45
Conversation
- Calculate number of lines in the file in the loader - Remove unused imports and fields - Reuse constructors - Reuse common parsing logic in tab delimiter importer - Show full stacktrace which helps in dinding where tests errored out
It's dangerous as we would further mess up the data in the row
Do it for all data types, not only MAF
- Split big tab. delim test to multiple tests based on data type. - Use ImportProfileData instead of ImportTabDelimData for testing. - We cover more logic with such tests. - This is more stable interface. ImportTabDelimData can be refactored.
…ation rows or not
args variable was not declared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @forus , amazing work 👍
I do have some questions on sections that I would like to double-check with you, and some naming improvements. Please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This loader covers the following data types:
Note: We do not support incremental upload of GSVA data although tab-delimited uploader covers this type with non-interactive upload. The incremental upload for GSVA data will throw an error.
We decided to exclude GSVA as there is unclarity from our part on whether GSVA scores of existing samples need to be recalculated when a new sample is added to the data.
This PR makes the tab. delimited data loader run in a single transaction to escape corruption of tsv rows (e.g. sample has been added to the header, but not to the info rows).
The first commit (b2c1c21) contains the code improvements. It could be handy to review it first.
=======
Please review #44 next.
Please note, to avoid the costly process of rebasing all pull requests (PRs), all feedback will be incorporated into the final PR.