-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ignore unused fields from CSV file. #131
Comments
The first part of your scenario is along the lines of If it were possible, a good choice here is to treat this as two pieces of code: the processing part that just wants But if what you want is to write a program that will handle arbitrary CSV files whose only constraint is that they contain the fields you want, then this becomes much more dynamic. Let's make sure we think we agree on the wanted types before writing code. We have a To be very specific, what is needed is a tweak of this function so that it does not insist on the data file matching up with the wanted type. The Does this all sound like it addresses your needs? |
Thanks for your quick answer, I think you summarized the issue very well and as far as I understand your conclusion is correct. Being able to parse the data is the important point. I am in favor of the second, more dynamic, approach. If you think that the required changes should be feasible, there is no rush and I can go ahead without and further check that Frames is a good fit for my workflow. I am still very new to Frames and Haskell, and getting started is not as easy as some other frameworks. |
I think the README needs to provide better motivation. The original point of the library was to couple the program and data file so you get compile-time errors when your analysis code diverges from the realities of your data file. Once you have extracted richer type information, you can do things like make more efficient use of memory, too. That focus on a particular trick -- the code matches the data -- is fine as far as it goes, but better supporting custom row parsing as you need is a great way to empower the programmer to deal with varied data. I'll put some time into it as soon as I can. |
Sounds great, I will add this project in my watch list. Thanks. |
I have been working with Here is one related and simple practical example I have been facing too: So after using |
Hi, I tried to follow your instructions above and worked on a modified version of the parsing. The idea is to define a record type manually:
There need to be a bit of code to parse the data file header to get the positions of each of the needed field:
This function will be fed with the record column names : The parsing 'readRec' function then need to be changed a bit to allow an
I did not look at performance and the above code is very unlikely to work in a general case as well as unlikely to be the solution you had in mind. But for my usage it is already helpful, hopefully we can integrate this sort of feature nicely in Frames! |
Hi! I wonder if the work already in https://github.com/adamConnerSax/Frames-streamly would be of help? You can see some examples in: https://github.com/adamConnerSax/Frames-streamly/blob/master/test/FramesStreamlySpec.hs. This addresses some versions of the issues you have mentioned here: skipping columns at load time as well as some enhanced value parsing. |
Hi,
Let's imagine I am working with a tabular data file which header might look like this "field1,field2,field3".
I am only using "field1" and "field2". "field3" might be there or not, a new field "field4" might appear in the future.. I do not have control over the number of columns, I just know that the fields I need are there.
Is there an easy way for me to define a type that contains only the fields I need and parse a CSV into a Frame possibly ignoring all the other unused fields or simply parse them as text?
If possible I would like to avoid using an example CSV file for types at compilation time.
The text was updated successfully, but these errors were encountered: