Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the handle_backticks issue #426

Closed
wants to merge 229 commits into from
Closed

Fix the handle_backticks issue #426

wants to merge 229 commits into from

Conversation

Kyle-Ye
Copy link
Contributor

@Kyle-Ye Kyle-Ye commented Nov 7, 2021

  • More runs in benchmark; .gitignore update
  • Arena allocator
  • Extensions API (Implement parsing extensions #123)
  • Strip extensions API down and separate from core
  • Table extension from c068469 reworked
  • Strikethrough extension from c068469 reworked
  • Autolink extension
  • Tagfilter extension
  • Reduce maximum amount of backticks
  • Get a clean build on MSVC (Don't rely on strnlen being available #5)
  • table: trim cells, fix escaping, cleanup (make mingw fails with undefined reference to _strnlen #4)
  • Initialise openers bottom correctly
  • Fix for inline parser changes
  • Abort if we fail to alloc chunk itself
  • Add a no-crash test
  • Add opaque_free_func to extensions
  • Use opaque instead of user_data in table
  • Rework email autolink as postprocessor
  • Compile shared library for extensions
  • Use extensions in spec test
  • Only escape pipes in commonmark output when necessary
  • Fix Windows build
  • Preserve number of tildes in failed strikethru
  • autolink simplification
  • autolink_delim only works with ")", fix balance behaviour
  • spec update
  • Windows build fix (again)
  • Support UTF-8 domains in autolinks
  • Handle links in quotes correctly.
  • Reference links in tables (Add Makefile target to fuzz with AFL #10)
  • Handle UTF-8 BOM (CRLF support #14)
  • Add CMARK_OPT_GITHUB_PRE_LANG
  • Use
  • Fix empty table cell behaviour (#17)
  • Limit arena
  • Add Dockerfile
  • Add -gfm suffix to artifacts
  • README updates
  • Correct manpages. Fixes #20.
  • Simple tables (#21)
  • Latest spec
  • Skip disabled extensions
  • roundtrip_tests reports results
  • Add GFM version number
  • Plaintext renderer (#25)
  • Remove normalize as an option per #190 (#194)
  • Remove dead/misleading code
  • Add table alignment getters (#29)
  • make install also installs extensions (#32)
  • 0.27.1.gfm.1
  • Add CMARK_GFM_VERSION define.
  • Fix link order of cmark-gfm (#35)
  • Add cmark_syntax_extension_get_private() (#36)
  • Unmark as static
  • Don't scan past an EOL (#37)
  • Regenerate scanner
  • Also exclude \n
  • Update cmark-fuzz for cmark-gfm
  • Latest cmake in Docker
  • Fix a misaligned write
  • 0.27.1.gfm.2
  • Install ninja-build
  • Avoid memcpy'ing NULL pointers (#38)
  • Case-insensitive tagfilter. Fixes #42. (#43)
  • Allocate memory from arena with correct alignment (#40)
  • Use unsigned integer when shifting (#39)
  • 0.27.1.gfm.3
  • 32 nested balanced parens in a link is bananas (#48)

  • 0.27.1.gfm.4
  • Latest spec
  • Fix typo (#52)
  • test: Add test case for pathological collisions
  • references: Fix pathological quadatric behavior
  • Fix pathological test runner on Windows
  • Add casts for MSVC10
  • Update to latest spec
  • Add the idempotent core_extensions_ensure_registered
  • Inline sourcepos (#53)
  • Remove unneeded TODO
  • 0.28.0.gfm.5
  • Sourcepos fixes (#54)
  • 0.28.0.gfm.6
  • Skip strikethroughs when considering emphasis (#55)
  • 0.28.0.gfm.7
  • Autolink should not cause : to be skipped
  • 0.28.0.gfm.8
  • Recursive chevrons are bananas (#49)
  • 0.28.0.gfm.9
  • blocks: Fix quadratic behavior in finalize
  • 0.28.0.gfm.10
  • No empty
  • 0.28.0.gfm.11
  • Period in email must precede alnum (#58)
  • feature test macros in harness
  • Fix install EXPORT target
  • Shift includes around for proper header install (#63)
  • add node.js wrapper (#46)
  • Footnotes (#64)
  • FOOTNOTE_REFERENCE has text content
  • Footnote fix per kivikakk/comrak#44
  • ASCII clean source
  • Add -lcmark-gfmextensions to libcmark-gfm.pc.in
  • Fix extensions with static only
  • Build static on Windows again
  • 0.28.3.gfm.12
  • Add CMARK_OPT_STRIKETHROUGH_DOUBLE_TILDE. Closes #71.
  • add CMARK_OPT_TABLE_PREFER_STYLE_ATTRIBUTES (#86)
  • add tests for --table-prefer-style-attributes (#87)
  • Remove square brackets when rendering HTML for footnotes (#90)
  • Handle deeply nested lists (#95)
  • Expose cmark_node_type CMARK_NODE_TABLE etc., make XCode happy with imported headers. (#96)
  • Debian packaging (#97)
  • Removed meta from list of block tags.
  • Fix spaces on regression test.
  • add regression test from comrak
  • latest spec
  • latest spec
  • regressions.txt has non-specified strikethrough
  • Add example of a Python wrapper which uses libcmark-gfmextensions. (#102)
  • Parse rest of info string as meta (#103)
  • 0.28.3.gfm.13
  • add plaintext render func for strikethru
  • 0.28.3.gfm.14
  • commonmark writer: escape tilde (~). (#106)
  • table extension: cosmetic fix for uniformity of output. (#105)
  • commonmark writer/strikethrough: use two tildes for delimiters. (#104)
  • Normalise header and define names (#109)
  • 0.28.3.gfm.15
  • ~ should not be escaped in href (#110)
  • Footnotes in tables (#112)
  • 0.28.3.gfm.16
  • Allow extension to provide "opaque" alloc function (#89)
  • XML attribute formatters (#116)
  • Add support for tables and strike-through text in the XSLT (#117)
  • 0.28.3.gfm.17
  • Be more strict on matching strikethrough (#120)
  • Remove /debian by suggestion in #122
  • update travis-ci link
  • fix image target
  • Default to safe operation (#123)
  • 0.28.3.gfm.18
  • Prevent out-of-bound memory access. (#124)
  • Limit the recursion in autolink extension. (#125)
  • Add plaintext rendering for footnotes. Otherwise, it crashes in debug (#126)
  • 0.28.3.gfm.19
  • Add GFM extensions to fuzzing harness (#127)
  • Fix a buffer overread in the CMark tables extension. (#128)
  • don't crash on test failure on macos
  • use pledge(2) (#132)
  • check for OpenBSD 5.9+
  • be more liberal in strikethru regression
  • fix misplaced parenthesis
  • add tasklist extension (#94)
  • add changelog entry
  • fix attribution
  • 0.28.3.gfm.20
  • Remove options mask from fuzzing harness (#129)
  • remove the class here
  • Adjustments to how the tasklist generation occurs (#136)
  • Define _DEFAULT_SOURCE to get various posix/gnu glibc functions declared (#137)
  • Add automatic configuration of compiler to get large file support (#138)
  • Fix bug with determining if task is complete & adjust to spec. (#142)
  • Add XML attribute to tasklist (#145)
  • Specify parenthesis matching in autolink extension (#148)
  • Fix valid domain ambiguity (#151)
  • Fix extended email autolink ambiguity (#152)
  • Fix table cannot be recoginsed without empty line (#154)
  • Fix hard line break example (#155)
  • correct _STATIC_DEFINE flag names
  • import spec changes
  • Change cmark_gfm_extensions_get_tasklist_state to cmark_gfm_extensions_tasklist_state_is_checked (#161)
  • Make "set" methods public, add "set" method for tasklist (#162)
  • Fixes Visual C++ 2019 compiler warnings for x64 targets (#166)
  • Fix bug where tasklist extension was using union in two ways. (#169)
  • Revert "import spec changes"
  • Add link to Tcl bindings. (#171)
  • Correct path to artifact (#173)
  • Rebuild ext_scanners.c with latest re2c.
  • [PATCH] Fix O(n*n) corner-case runtime in GFM's table extension.
  • Add a test.
  • Restore compatibility with other changes.
  • Add Swift Package Manager Support
  • Use pre-set config.h header
  • Track opening backtick count for inline code spans
  • apply block offsets for autolink source position info
  • don't let blocks get end lines before their start lines
  • Add inline directive syntax
  • properly set image/link sourcepos when spanning multiple lines
  • Add ^ to special chars array
  • fixes existing data races
  • add mutex initializer in new header
  • add locks around arena ops
  • tweak definitions of statics in inlines.c
  • add locks around extensions registry ops
  • make locking a compile-time setting
  • add latch macros and use them for registering plugins
  • fix deadlock in arena
  • use pthread_once instead of atomics
  • Add preserve-whitespace and inline-only options
  • Allow all whitespace when preserving whitespace
  • Don't emit an attribute node if it doesn't have parentheses
  • update use of mutexes
  • move global characters arrays into the parser
  • free special char blocks alongside the parser
  • don't reset the special-char blocks in parser_reset
  • add comment about freeing special-chars memory
  • save special_chars/skip_chars in parser_reset
  • don't leak my_ext in the parser_interrupt test
  • Add custom attributes using ^[foo][N] syntax
  • Preserve leading newlines when CMARK_OPT_PRESERVE_WHITESPACE is set
  • add cmark-gfm-bin target to Package.swift
  • add .build/.swiftpm to gitignore
  • add api_test to Package.swift
  • fix warning in api_test
  • add explicit modulemap for cmark-gfm
  • add intention to take upstream changes
  • Update README.md
  • [Bugfix] Fix the backticks bug

Yuki Izumi and others added 30 commits June 30, 2017 12:03
This allocator allocates a 4MiB arena into which all allocations are
made, and then increasingly larger arenas as earlier ones are used up.
Freeing memory in the arena is a no-op: clean all memory with
cmark_arena_reset().

In order to support realloc, we store the size of each allocation in a
size_t before the returned pointer.

The speedup is over 25% on large (benchmark-sized) inputs -- we pay a
small increase in maximum RSS (~10%) for this.
Note this includes a hack to the core code to escape pipes in the
'commonmark' renderer.  This is to fix test cases with the table
extension; i.e. we treat pipes as special characters that need escaping.

We use the cmark_mem of the parser in order to ensure we use the arena
allocator when necessary.  A very flexible table format is supported;
see test/extensions.txt for examples.  Leading and trailing pipes can be
omitted, and alignment specifiers can be used in the separator between
the header and body.  Table bodies don't need to be a consistent width.
Embedded HTML is OK.

Note we reuse the inline parser from cmark to parse tables -- this is to
ensure pipes e.g. in the middle of an inline code block don't
prematurely terminate a table cell.
This is quite straightforward; we do take care in other extensions (i.e.
autolink) to ensure tildes are left for the strikethrough extension to
consume.
The autolinker is based on https://github.com/vmg/rinku with some
additional changes and fixes.  We do our best not to include
punctuation, but to include matching parentheses within a link.
When we encounter a tag that causes an HTML 5 parser's content model
flag [1] to be changed to RCDATA, CDATA or RAWTEXT [2] [3], we escape
the tag by replacing its opening "<" with "&lt;".  This causes the tag
to appear verbatim in the page it's placed on.

We do this to prevent users breaking the page content, where the parser
would not interpret further tags as inserted by cmark as HTML until a
matching close tag was hit.  (Such a closing tag could exist if a user
entered it themselves, but it'd cause all cmark-generated markup in
between to be rendered raw, and is unlikely to be desireable behaviour.)

[1] https://www.w3.org/TR/2009/WD-html5-20090423/syntax.html#tokenization
[2] https://www.w3.org/TR/2009/WD-html5-20090212/serializing-html-fragments.html#parsing-html-fragments
[3] https://github.com/google/gumbo-parser/blob/aa91b27b02c0c80c482e24348a457ed7c3c088e0/src/parser.c#L4023-L4053
* Include table alignment when rendering LaTeX
* Include table alignment when rendering man (preserving default centre
  alignment here)
* Trim table cell interiors
* Expand test cases
* Fix escaping behaviour
* Do not use enum for alignment
* Do not collide against stdlib `ispunct`
* Cleanup pipe code
* Don't reparse matched rows
(caller requires :// anyway)
* Add failing test.

* Fix by parsing inlines after blocks are done
QuietMisdreavus and others added 27 commits August 3, 2021 08:51
since cmark_parser_reset is called in cmark_parser_finish, this state
would be inconsistent if you reused a parser with extensions multiple
times.
this silences the "header should be renamed to be used as an umbrella
header" warning
@Kyle-Ye Kyle-Ye closed this Nov 7, 2021
@Kyle-Ye Kyle-Ye deleted the gfm branch November 7, 2021 17:01
@Kyle-Ye Kyle-Ye restored the gfm branch November 7, 2021 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.