-
-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement parsing extensions #123
base: master
Are you sure you want to change the base?
Implement parsing extensions #123
Conversation
yay MSVC \o/ I'll have a look at the issues tomorrow, hoping it does not just uncover a new layer :) |
src/cmark.h
Outdated
cmark_node*cmark_parser_add_child(cmark_parser *parser, | ||
cmark_node *parent, | ||
cmark_node_type block_type, | ||
int start_column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't extensions simply use cmark_node_append_child
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I suggest you look at the implementation of add_child
in src/blocks.c
, this function finalizes open parent blocks, and creates the new block as open. This is really a parsing-specific method, and it cannot be replaced by cmark_node_append_child
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this to a separate public header
Some general notes:
But that's not what your table and strikethrough extensions do. They define their own block types in the cmark core. If your extension nodes used the custom node types, it would be more convincing that this works for other extensions, too. Regarding dynamic loading and plugin discovery: I think we should leave this out for now, and only have a function to register an extension with a parser. It might also be a good idea to move everything related to extensions to a separate public header file. There are some functions like |
Thanks for the review, answering the general comment here:
That is what my extensions do, they don't define their own block types, cmark does, it so happens that the commits for adding some of them live in the same branch. For reference here's the code for a bunch of other extensions: https://github.com/hotdoc/hotdoc/blob/master/hotdoc/parsers/cmark_gtkdoc_extension.c If you look at https://github.com/hotdoc/hotdoc/blob/master/hotdoc/parsers/cmark_gtkdoc_extension.c#L256 for example, you will see that this extension instantiates a regular CMARK_NODE_CODE_BLOCK, it's a totally legit use case for the syntax extensions design I propose to create standard block types according to their own syntax rules, this makes them robust (block precedence is enforced by cmark's core) and concise (rendering for all possible output formats is handled by cmark's core) The key element in this proposed design is that extensions do not define their own block types, as I believe there is no satisfying way to ensure inter-extension block precedence / can_contain rules. That is also why my extensions do not use the "CMARK_NODE_CUSTOM_*" types, as I am not interested in a fragile solution, or in implementing format-specific rendering code. There is also no actual correlation between the "convicingness" (is that a word?) of my proposal and the fact that my extensions (do not) use these types. Note also that the two new block types I implemented in cmark are two extensions for which there is (or at least seems to be) a clear support intention for upstream in the future. I think it can't really hurt to implement the rendering code for them ahead of definitive syntax rules, quite the contrary actually.
ack, I'll split that out
That could make sense, along with an "unstable" warning, and maybe a preprocessor directive to enable it (CMARK_ENABLE_EXTENSION_API or something like that) I would be interested in @jgm's opinion on that last point Last thing, @nwellnhof I noticed you made a few comments on outdated diffs, and intentionally didn't address these, can you make these comments again on the current diff? I've had a lot of work today, and what little time I had to work on cmark I spent addressing this (very appreciated) review, I will make the agreed-upon changes tomorrow (I hope :) Thanks! |
790a6ea
to
3843151
Compare
I apologize for the noise here (though github should apologize really), I stick to a rebase workflow because I prefer having a clean history to work with, instead of piling up commits on top of my branch. EDIT: seems it sort of fixed itself, at the time I wrote that github was displaying the original commits and their multiple new versions |
3226a53
to
4cee305
Compare
I have addressed all the review comments, either correcting the issues that were pointed out or answering the comments. This branch now installs a new public header "cmark_extension_api.h", separate from cmark.h, to be included and used by syntax extension writers. Additionally, I've also separated the plugin loading and discovery code to its own commit, however after thinking back on it I'd really like @nwellnhof to come true on the windows port promise: this is a pretty important part of the functionality that you're asking me to take away from the branch, and you have pretty vehemently protested my proposal of using the GLib, which would have offered us a portable and well-tested plugin interface. It would be a bit ugly to oppose this proposal, state that you can port the code to Windows, then just ask me to drop it when comes the time to actually do what you promised. If you do not have time to work on that, I propose we merge the linux implementation to start with. @jgm, do you have time for reviewing this branch these days? |
4cee305
to
95c2f16
Compare
There is now a PHP PECL extension of cmark (https://github.com/krakjoe/cmark), but in order to utilize it within https://github.com/thephpleague/commonmark (which is highly extensible), the underlying It would be nice to get some traction on this again. |
@MarkCarver I completely agree that this would be lovely, but we'd need to get @jgm back in here, and validate the design decision made by this PR, which was to implement support for output node types (eg tables) without an official agreed upon input syntax. I am still pretty convinced this is a safe enough decision, but well :) |
Any updates on this? |
ping @jgm |
Sorry, I just haven't had the bandwidth to devote to this. This extension mechanism has been implemented in github's fork of cmark. Those who need extensions can get them there. Since this library purports to be the reference implementation, and it needs to be updated every time the spec is changed, it makes sense to me to keep it simple and not worry about extensions for now at least.... |
Well let's just close this then. Edit: it's a bit annoying that a fork was the solution to this |
Please don't close this issue. I still believe it's needed as a primary feature in cmark, regardless of how long it may take. Offering https://github.com/github/cmark-gfm as a "solution" is really much of a solution for https://github.com/krakjoe/cmark or https://github.com/thephpleague/commonmark Having the ability to extend cmark should be a global ability, not some GFM implementation. |
I'd also favor keeping this open. I'm not saying the fork was the solution. I'm just saying that, for now, I don't have the time to deal with it, and keeping the profile of this project slimmer makes it easier for me to keep up with it. |
I'm open to merging this, really. But it's a huge patch and surveying it is not simple. And I'm reluctant to ask @MathieuDuponchelle to rebase it again so it can be reviewed, given my past footdragging on this. If someone else wants to try to get it in shape, I can take a look, but no guarantees. |
I think it would make more sense to split this PR up, i.e. the extension API(s) as a single PR and then the actual subsequent extension implementations into their own individual PRs? |
reopened, I had ideas on how to make this more palatable and if there's interest I can give it a go again at some point, I don't think after 3 years there is much of an urgency anyway :) |
I think the extension API is easier to review alongside extension examples tbh, however I can rework this to disallow mixing / matching multiple extensions from different origins, thus allowing extensions to define their own node types in a "safe" manner, that's basically what github ended up doing iirc |
Is linking in |
Did GitHub keep the dynamic linking aspect of this, I don't recall? |
My plan would get rid of the dl linking entirely |
I think they didn't, but that was a long time ago :) The problem without dynamic linking is that you end up with two solutions for people who want extensions:
|
Can you clarify this? You want to avoid linking in dl, but still have dynamic plugins? How is it done? (I'm sorry, I've forgotten a lot of the context.) |
No, those two potential solutions were in the event that cmark wouldn't allow for third-party / dynamic plugins, hence not use dl / equivalent for other platforms. I'm on the fence as to whether I want dynamic extensions or not, as in my specific case I have extensions for which it would make no sense to maintain them in cmark itself (eg, inline links to symbol names with |
I'm pretty sure you don't need anything as heavyweight as an extension to support these. These could be handled with a transformation of the AST after parsing, no? |
nope, not if I care about having proper parsing at least :) I don't remember the exact details, but a syntax extension is the correct approach here. Besides, these are just examples really, and syntax extensions are not heavyweight, they're actually more performant than whatever a posteriori, fragile AST parsing I could come up with. |
* default to safe * fix setter test
This is too strict, as it prevents the use of dynamically loaded extensions: see commonmark#123 (comment). Documented in man page and public header that one should use the same memory allocator for every node in a tree.
Reference discussion: #100
I just rebased the branch on master and tested it, make passes with no new warnings all along the branch, valgrinding a table test shows no leaks.
I applied the initial review comments from @jgm and @nwellnhof , which were to not expose cmark_strbuf, which is thus included in the sources of the reference "core extension", and to not expose API using cmark's bool type, which is replaced by integers, rendering installation of config.h unnecessary.
The design implemented is the second one I proposed in the above discussion, here's a brief summary:
A new structure is exposed,
cmark_syntax_extension
, which exposes hooks that match the block and inline parsing strategies outlined in http://spec.commonmark.org/0.25/#appendix-a-parsing-strategy .This proposal does not cover the implementation of new block types, instead syntax extensions need to create blocks with types defined by cmark itself, including
CMARK_NODE_CUSTOM_BLOCK
andCMARK_NODE_CUSTOM_INLINE
, this already covers a wide range of use cases.That is because in my opinion, we want cmark to guarantee that the output AST will always be correctly structured (with the exception of custom inlines and blocks), with no "impossible" or "maningless" block nesting situations.
An interesting enhancement would be to refactor the custom inline and block types to let extensions provide virtual methods such as "can_contain" for them, but this will be somehow fragile as "inter-extension negotiation" will not be possible.
An important part of this proposal is that cmark implement and expose block types for which it does not yet have a formal syntax specification, such as tables or strikethroughs. I believe this is desirable because this will allow multiple syntax extensions to compete for a given desired output, without tying cmark's specification to a specific syntax, while still making sure the produced output is correct. It makes sense when looking at it as a testing ground if you will.
It is important to state that this design doesn't prevent us from implementing negotiation in the future if we so desire.
I think this branch is now approaching what I'd consider mergeable quality, with the exception of the plugin code, which I think is mostly correct on Linux, but will not work on Windows. If @nwellnhof is interested in tackling the abstraction for Windows, then all the best, otherwise the code can be merged without plugin loading / discovery for now, as it is already useful for people using cmark as a library (such as I).
Last thing, this last rebase round has been pretty involved, and I'd very much like (if we all agree that extensions are desirable) that merging this branch be made a priority, I will of course address reviews in a timely manner.