Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert baseurl links w/ fragments #1036

Merged
merged 4 commits into from
Feb 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 18 additions & 7 deletions static/js/lib/ckeditor/plugins/ResourceLinkMarkdownSyntax.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ import { createTestEditor, markdownTest } from "./test_util"
import { turndownService } from "../turndown"

import { RESOURCE_LINK } from "@mitodl/ckeditor5-resource-link/src/constants"
import ResourceLinkMarkdownSyntax from "./ResourceLinkMarkdownSyntax"
import ResourceLinkMarkdownSyntax, {
encodeShortcodeArgs as encode
} from "./ResourceLinkMarkdownSyntax"
import Paragraph from "@ckeditor/ckeditor5-paragraph/src/paragraph"

const getEditor = createTestEditor([
Expand Down Expand Up @@ -44,9 +46,10 @@ describe("ResourceLink plugin", () => {

markdownTest(
editor,
'{{< resource_link 1234-5678 "link text" "some-header-id" >}}',
`<p><a class="resource-link" data-uuid="${encodeURIComponent(
"1234-5678#some-header-id"
'{{< resource_link 1234-5678 "link text" "#some-header-id" >}}',
`<p><a class="resource-link" data-uuid="${encode(
"1234-5678",
"#some-header-id"
)}">link text</a></p>`
)
})
Expand All @@ -56,7 +59,9 @@ describe("ResourceLink plugin", () => {
markdownTest(
editor,
'{{< resource_link asdfasdfasdfasdf "text here" >}}',
'<p><a class="resource-link" data-uuid="asdfasdfasdfasdf">text here</a></p>'
`<p><a class="resource-link" data-uuid="${encode(
"asdfasdfasdfasdf"
)}">text here</a></p>`
)
})

Expand All @@ -65,7 +70,11 @@ describe("ResourceLink plugin", () => {
markdownTest(
editor,
'dogs {{< resource_link uuid1 "woof" >}} cats {{< resource_link uuid2 "meow" >}}, cool',
'<p>dogs <a class="resource-link" data-uuid="uuid1">woof</a> cats <a class="resource-link" data-uuid="uuid2">meow</a>, cool</p>'
`<p>dogs <a class="resource-link" data-uuid="${encode(
"uuid1"
)}">woof</a> cats <a class="resource-link" data-uuid="${encode(
"uuid2"
)}">meow</a>, cool</p>`
)
})

Expand All @@ -75,7 +84,9 @@ describe("ResourceLink plugin", () => {
.processor as unknown) as MarkdownDataProcessor
expect(md2html('{{< resource_link uuid123 "bad \\" >}}')).toBe(
// This is wrong. Should not end in &lt;/a&gt;
'<p><a class="resource-link" data-uuid="uuid123">bad &lt;/a&gt;</p>'
`<p><a class="resource-link" data-uuid="${encode(
Comment on lines -78 to +87
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using encodeURIComponent(JSON.stringify(args)) for multiple args seems simple, but it means that 'uuid1' shows up as '%5B%22uuid1%22%5D' instead of just `'uuid1'. We could probably avoid this if we really want, but handling 1 args the same as 2 args seems clean, if aesthetically unappealing in tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this, what you're saying is that because it's encoding both class and data-uuid that it will include the quotations for both as part of the string.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to write this a slightly longer reply because I also want to share it with Alice, whom I have not talked to about this change yet but should probably make aware

I'm not sure what class you refer to, but btw: data-uuid is an outdated name. Previously it was a uuid. Now it's a string that encodes a uuid and possibly more information (like the anchor). Changing the name of that attribute would break our CKEditor ResourceLink plugin (which is in a different repo, https://github.com/mitodl/ckeditor5). We probably should change the attribute name from data-uuid to something else, like data-link (which ironically might have been the original name in the Link plugin, I'm not sure).

If that sounds weird...

My understanding from talking with @alicewriteswrongs is: In CKEditor, links are modeled as an attribute on text and text in CKEditor can only have one attribute. For our resource link stuff, the value of that attribute is determined by the data-uuid attribute on the anchor tag.

# to/from CKEditor:
Markdown --> HTML --> CKEditor internal representation
         <--      <--
1. convert shortcode to text + metadata (metadata = uuid + anchor)
2. store metadata on anchor data-uuid attribute
3. CKEditor uses the data-uuid attribute as the single attribute for the
   link text in its internal representation

The important thing is: In CKEditor's model of a link, it can store the text and one other value. (No more than one other value). For normal links, that "one value" is the url. For our resource_links, that one value was previously the uuid.

But now we need to store two pieces of information in that one value: the uuid and the fragment id.

The way ResourceLink plugin is set up, the "one value" is the same as "data-uuid" attribute on the <a></a> tag.

In Alice's PR (#1037) the two pieces of information were encoded as: uuid#fragment, i.e., "piece1#piece2" with '#'as the separator. That was really simple but (1) makes including. the '#' piece in the shortcode value slightly harder and also precludes us from using this for search params?foo=1&bar=2` if we ever needed to.

So in this PR, I switched the encoding to just use JSON.stringify/parse to encode/decode the multiple pieces (uuid+anchor) to/from a single value. Except the output of JSON.stringify be an HTML attribute value since it has quotations, so I had to use encodeURIComponent/decode....

This works well for passing 2 (or more) args as a single value to/from CKEditor.

All I meant in my original comment was: It also works for passing 1 argument, but it's kinda ugly because the single argument also has the quotations and square brackets from JSON.stringify:

const uuid = 'uuid1'
const asJson = JSON.stringify(['uuid1']) // stringify an array for consistency with the two-argument case
// out: the string '["uuid1"]'
const encoded = encodeURIComponent(asJson)
// out: the string '%5B%22uuid1%22%5D'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should call the attr data-link-attrs or something like that. If we want to change it we just need to release a new version of our forked link package with the change, and then introduce a change here in Studio which upgrades to that version and changes our resource link markdown code - since this stuff is all in-memory CKEditor stuff we shouldn't break anything by doing that, all this wrangling just has to do with basically how to pass this data through CKEditor when the content is actually open in the editor.

As far as the aesthetic aspects of using the JSON thing for one argument, I have no objections to the approach you outlined. What we need is something that lets us losslessly map one or two short strings to a single string and back again, I think JSON.stringify | encodeURIComponent should give us that.

Copy link
Contributor Author

@ChristopherChudzicki ChristopherChudzicki Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopherChudzicki thanks for the explanation, I now have a greater understanding of how this works and what you meant by a single argument being "ugly." I don't think it matters as long as the data comes out as expected on the other side.

"uuid123"
)}">bad &lt;/a&gt;</p>`
)
})
})
18 changes: 12 additions & 6 deletions static/js/lib/ckeditor/plugins/ResourceLinkMarkdownSyntax.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ import {
RESOURCE_LINK
} from "@mitodl/ckeditor5-resource-link/src/constants"

export const encodeShortcodeArgs = (...args: (string | undefined)[]) =>
encodeURIComponent(JSON.stringify(args))

const decodeShortcodeArgs = (encoded: string) =>
JSON.parse(decodeURIComponent(encoded))

/**
* (\S+) to match and capture the UUID
* "(.*?)" to match and capture the label text
Expand Down Expand Up @@ -57,10 +63,10 @@ export default class ResourceLinkMarkdownSyntax extends MarkdownSyntaxPlugin {
linkText: string,
fragment?: string
) => {
const formattedUUID = fragment ? `${uuid}#${fragment}` : uuid
return `<a class="${RESOURCE_LINK_CKEDITOR_CLASS}" data-uuid="${encodeURIComponent(
formattedUUID
)}">${linkText}</a>`
const encoded = fragment ?
encodeShortcodeArgs(uuid, fragment) :
encodeShortcodeArgs(uuid)
return `<a class="${RESOURCE_LINK_CKEDITOR_CLASS}" data-uuid="${encoded}">${linkText}</a>`
}
}
]
Expand All @@ -79,9 +85,9 @@ export default class ResourceLinkMarkdownSyntax extends MarkdownSyntaxPlugin {
)
},
replacement: (_content: string, node: Turndown.Node): string => {
const [uuid, anchor] = decodeURIComponent(
const [uuid, anchor] = decodeShortcodeArgs(
(node as any).getAttribute("data-uuid") as string
).split("#")
)

if (anchor) {
return `{{< resource_link ${uuid} "${node.textContent}" "${anchor}" >}}`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ class BaseurlReplacementRule(MarkdownCleanupRule):
regex = (
r"\\?\[(?P<title>[^\[\]\n]*?)\\?\]"
+ r"\({{< baseurl >}}(?P<url>.*?)"
+ r"(/?#(?P<fragment>.*?))?"
+ r"(/?(?P<fragment>#.*?))?"
+ r"\)"
)

Expand All @@ -94,8 +94,6 @@ def __call__(self, match: re.Match, website_content: WebsiteContent):
escaped_title = match.group("title").replace('"', '\\"')
url = match.group("url")
fragment = match.group("fragment")
if fragment is not None:
return original_text

# This is probably a link with image as title, where the image is a < resource >
if R"{{<" in match.group("title"):
Expand All @@ -105,8 +103,7 @@ def __call__(self, match: re.Match, website_content: WebsiteContent):
linked_content = self.content_lookup.get_content_by_url(
website_content.website_id, url
)
return (
f'{{{{< resource_link {linked_content.text_id} "{escaped_title}" >}}}}'
)
fragment_arg = f' "{fragment}"' if fragment is not None else ""
return f'{{{{< resource_link {linked_content.text_id} "{escaped_title}"{fragment_arg} >}}}}'
except KeyError:
return original_text
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,13 @@ def get_markdown_cleaner(website_contents):
R"This link should change [text title]({{< baseurl >}}/resources/path/to/file1) cool",
R'This link should change {{< resource_link content-uuid-1 "text title" >}} cool',
),
( # should not touch fragments
R"This link includes a fragment [text title]({{< baseurl >}}/resources/path/to/file1#some-fragment) cool",
(
R"This link includes a fragment [text title]({{< baseurl >}}/resources/path/to/file1#some-fragment) cool",
R'This link includes a fragment {{< resource_link content-uuid-1 "text title" "#some-fragment" >}} cool',
),
( # should not touch fragments with / before #
R"This link includes a fragment with slash first [text title]({{< baseurl >}}/resources/path/to/file1/#some-fragment) cool",
(
R"This link includes a fragment with slash first [text title]({{< baseurl >}}/resources/path/to/file1/#some-fragment) cool",
R'This link includes a fragment with slash first {{< resource_link content-uuid-1 "text title" "#some-fragment" >}} cool',
),
(
# < resource_link > short code is only for textual titles
Expand Down