-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow any valid HTML4 identifier string to be a djot identifier string #303
Comments
Currently the syntax for attributes (undocumented except in code comments) is
So we don't allow Class names have more restrictions (at least if they're to be used with CSS). EDIT: Anyway, I'm open to making this less restrictive, but some thought needs to go into what would be a reasonable restriction. |
At first glance, it seems like djot has a principal to not distinguish between the first character and other characters in ids, possibly for simplicity of implementation? Which dictates that
As I understand it HTML4 ids are generally extremely restrictive, because they follow the SGML rules laid out ISO 8879:1986. The only case I see where djot is more restrictive than HTML4 is that "foo.bar" is a valid HTML4 identifier but an invalid djot identifier because it contains a I have one firm proposal, which is to disentangle the identifier and class rules to allow non-initial identifier characters to be periods. I.e.:
My goals would be served equally well by the parser accepting periods on ids in any position but requiring them to be escaped ( I don't have opinions about any larger related changes, though I do like how unicode characters can be id and class names in djot. Just for context, I should possibly say that my interests here are not primarily in writing in DJOT, but in getting things into djot's AST, which is much nicer to work with than pandoc's for my purposes. |
Thanks for your work; I am excited about using this project.
I'm converting some markdown files to djot with pandoc and am hitting an unfortunate behavior. I'm uncertain if it's a bug in pandoc's djot writer, a needed change in djot, or neither; would be willing to contribute in either codebase if it is one.
produces
which parses to
The same text without the period at the end compiles to the desired
Of course djot can set whatever rules it wants on what belongs in an ID, which implies the pandoc writer should not be writing a djot-invalid identifier; but unless I'm missing something the simpler solution would seem to be allowing any valid SGML and HTML4 identifier to be a valid djot identifier, where "ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".")."
The text was updated successfully, but these errors were encountered: