[Issue 4] Incremental parsing for CSV with Headers #22

CristhianMotoche · 2022-10-04T15:36:33Z

It solves part of #4. I'll add the changes 'incrementally' (pun intended).

Changes:

Add new module Data.Csv.Parser.Megaparsec.Incremental
Expose decodeHeader
Add tests for it

TODO:

Fix TODO
Document code
Address some edge cases

CristhianMotoche · 2022-10-09T15:02:49Z

Hey @cptrodolfox After some thought, I see this doesn't really solve #4. I'm a bit consufed on the state of Megaparsec for incremental parsing. I see in the Changelog it's mentioned that:

Now state returned on failure is the exact state of parser at the moment when it failed, which makes incremental parsing feature much better and opens possibilities for features like “on-the-fly” recovering from parse errors.

This was for version 4.4.0 which was relesed on Feb 2016. However, I later noticed this issue on Megaparsec and it seems:

Megaparsec is not a streaming/incremental parsing library

Therefore, I assume we won't actually have an easy way to solve #4 at the moment.

I'll take a look at it later to be 100% sure if that is an impossible issue to fix for now.

CristhianMotoche · 2022-10-09T21:22:21Z

Hey @cptrodolfox I think a possible solution would be what @mrkkrp suggested here:

You could perhaps write a parser for dealing with a single line and then glue everything together with a streaming library.

I think that could be a possible option. Nevertheless, I would prefer to implement it in a separated library (e.g. cassava-megaparsec-incremental) to avoid adding an extra library to cassava-megaparsec. What do you think?

cptrodolfox · 2022-10-18T20:16:33Z

Hey @cptrodolfox I think a possible solution would be what @mrkkrp suggested here:

You could perhaps write a parser for dealing with a single line and then glue everything together with a streaming library.

I think that could be a possible option. Nevertheless, I would prefer to implement it in a separated library (e.g. cassava-megaparsec-incremental) to avoid adding an extra library to cassava-megaparsec. What do you think?

Hey @CristhianMotoche , sorry for the late reply. I think that a better approach is to see how does cassava implement incremental parsing. From what I can gather from cassava's source code it implements its own type for incremental parsing.

https://hackage.haskell.org/package/cassava-0.5.3.0/docs/src/Data.Csv.Incremental.html#Parser

CristhianMotoche · 2022-11-11T15:51:19Z

Hey @cptrodolfox Sorry for the late reply as well.

Hey @CristhianMotoche , sorry for the late reply. I think that a better approach is to see how does cassava implement incremental parsing. From what I can gather from cassava's source code it implements its own type for incremental parsing.

That was my approach at first. cassava has the HeaderParser data type which has three constructors: FailH, PartialH and DoneH. It implements incremental parsing by pattern matching the results of attoparsec which include one constructor (called Partial in the IResult data type) to continue with the parsing.

Therefore, we cannot have an incremental parsing like the one of cassava since megaparsec doesn't provide that. Does it make sense?

CristhianMotoche · 2022-12-12T01:38:54Z

Hey @cptrodolfox I've been trying to replicate something similar to decodeWithP from cassava but I don't think it will be possible since the only possible results of Text.Megaparsec.parse is either an error or a record result. I was trying to consume some input until a breakline but there could be breaklines in the middle of text (e.g. foo,bar,"hello\nworld",123). I'm running out of ideas to solve this in an incremental way. Please, let me know if you have something in mind.

CristhianMotoche added 2 commits October 4, 2022 08:36

WIP

6bd2237

Parse until end of string

7ed4357

CristhianMotoche added the enhancement label Oct 4, 2022

CristhianMotoche self-assigned this Oct 4, 2022

CristhianMotoche added 2 commits October 5, 2022 08:53

Update tests

c404589

Let user avoid calling parser with an empty string

07970f0

CristhianMotoche marked this pull request as ready for review October 7, 2022 17:55

CristhianMotoche requested a review from cptrodolfox October 7, 2022 17:55

CristhianMotoche marked this pull request as draft October 9, 2022 15:02

WIP

43da223

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue 4] Incremental parsing for CSV with Headers #22

[Issue 4] Incremental parsing for CSV with Headers #22

CristhianMotoche commented Oct 4, 2022 •

edited

Loading

CristhianMotoche commented Oct 9, 2022

CristhianMotoche commented Oct 9, 2022

cptrodolfox commented Oct 18, 2022

CristhianMotoche commented Nov 11, 2022

CristhianMotoche commented Dec 12, 2022

[Issue 4] Incremental parsing for CSV with Headers #22

Are you sure you want to change the base?

[Issue 4] Incremental parsing for CSV with Headers #22

Conversation

CristhianMotoche commented Oct 4, 2022 • edited Loading

Changes:

TODO:

CristhianMotoche commented Oct 9, 2022

CristhianMotoche commented Oct 9, 2022

cptrodolfox commented Oct 18, 2022

CristhianMotoche commented Nov 11, 2022

CristhianMotoche commented Dec 12, 2022

CristhianMotoche commented Oct 4, 2022 •

edited

Loading