Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert baseurl links w/ fragments #1036

Merged
merged 4 commits into from
Feb 24, 2022
Merged

Conversation

ChristopherChudzicki
Copy link
Contributor

@ChristopherChudzicki ChristopherChudzicki commented Feb 18, 2022

Pre-Flight checklist

  • Testing
    • Code is tested
    • Changes have been manually tested

What are the relevant tickets? #1015

What's this PR do?

Add markdown_cleanup rule to convert baseurl links with fragments [title]({{< baseurl >}}/pages/whatever/file#some-heading) to resource_links < resource_link uuid "title" "some-heading" >.

Includes changes to our CKeditor so that these resource_links display properly. But DOES NOT support saving these links. Currently saving markdown like < resource_link uuid "title" "some-heading" > results in the some-heading portion being lost. (@alicewriteswrongs is working on the necessary change to our CKEditor ResourceLink plugin.)

How should this be manually tested?

Run

# To generate csv of replacements only
docker-compose exec web python manage.py markdown_cleanup baseurl -o meow2.csv
# to commit changes to database
docker-compose exec web python manage.py markdown_cleanup baseurl --commit

and view the changes in Studio editor.

Some pages that would be affected:

b7cc5225-96b8-bb67-252f-d5048c2b43f4
    course: The Biology of Aging: Age-Related Diseases and Interventions
    page: Readings
df5e63ff-8c3f-cf7e-32cd-20fae4ed96c7
    course: Design of Electromechanical Robotic Systems
    page: Calendar
db02fb8f-2784-8eb9-d53c-72fe22be0540
    Immune Cell Migration: On the Move in Response to Pathogens and Cancer Immunotherapy
    Readings

@codecov-commenter
Copy link

codecov-commenter commented Feb 18, 2022

Codecov Report

Merging #1036 (9530e61) into master (934657b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1036   +/-   ##
=======================================
  Coverage   91.10%   91.10%           
=======================================
  Files         200      200           
  Lines        7652     7656    +4     
  Branches     1333     1333           
=======================================
+ Hits         6971     6975    +4     
  Misses        596      596           
  Partials       85       85           
Impacted Files Coverage Δ
...lib/ckeditor/plugins/ResourceLinkMarkdownSyntax.ts 100.00% <100.00%> (ø)
...agement/commands/markdown_cleaning/baseurl_rule.py 96.07% <100.00%> (-0.08%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 934657b...9530e61. Read the comment docs.

alicewriteswrongs added a commit that referenced this pull request Feb 18, 2022
We recently added support for an ID param on resource links
(mitodl/ocw-hugo-themes#444) in the course theme. We don't yet have
these in prod or anywhere, but soon we'll be converting some `{{<
baseurl >}}-based` links (#1036)  over to this format, so we want to
have support for it in CKEditor ready to go.

pr #1037
Copy link
Contributor

@gumaerc gumaerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about how the URL is going to be reconstructed in the theme:

R"This link includes a fragment with slash first [text title]({{< baseurl >}}/resources/path/to/file1/#some-fragment) cool",
R'This link includes a fragment with slash first {{< resource_link content-uuid-1 "text title" "some-fragment" >}} cool',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at the updated resource_link.html shortcode in ocw-hugo-themes:

{{- $uuid := index .Params 0 -}}
{{- $title := index .Params 1 -}}
{{- $anchor_id := index .Params 2 | default ""}}
{{- range where $.Site.Pages "Params.uid" $uuid -}}
    <a href="{{- .Permalink -}}{{- $anchor_id -}}">{{ $title }}</a>
{{- end -}}

We're fetching fragment_id there from the second argument, but it looks like the fragment itself here doesn't include the # character. Are we expecting that character to be in the template, or passed in? If we're not going to put it in the theme, this new argument could be used even to just put in any query string on the URL. Either way, it doesn't currently look like the # will be preserved, or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhhh good catch. I was testing this in studio but hadn't followed a link yet.

And yes, I agree—leaving the second arg flexible for # or ? is good.

@gumaerc gumaerc self-assigned this Feb 18, 2022
@ChristopherChudzicki
Copy link
Contributor Author

ChristopherChudzicki commented Feb 22, 2022

@gumaerc Thanks for catching this. I've made two updates to the PR:

  • change slightly how ResourceLinkMarkdownSyntax was handling the non-uuid fragment/hash portion. Before we were relying on splitting at '#'. Now we aren't, which makes it a little easier to include the '#' and would allow ? querystrings in future if needed
  • changed the management command to keep the '#'

I've now tested this with ocw-hugo-thmes locally using https://github.mit.edu/ocw-content-rc/16.90-spring-2014 ... resource _links link-to-sections are working for me in ocw-hugo-themes now. Should have tested that before.

Comment on lines -78 to +87
'<p><a class="resource-link" data-uuid="uuid123">bad &lt;/a&gt;</p>'
`<p><a class="resource-link" data-uuid="${encode(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using encodeURIComponent(JSON.stringify(args)) for multiple args seems simple, but it means that 'uuid1' shows up as '%5B%22uuid1%22%5D' instead of just `'uuid1'. We could probably avoid this if we really want, but handling 1 args the same as 2 args seems clean, if aesthetically unappealing in tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this, what you're saying is that because it's encoding both class and data-uuid that it will include the quotations for both as part of the string.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to write this a slightly longer reply because I also want to share it with Alice, whom I have not talked to about this change yet but should probably make aware

I'm not sure what class you refer to, but btw: data-uuid is an outdated name. Previously it was a uuid. Now it's a string that encodes a uuid and possibly more information (like the anchor). Changing the name of that attribute would break our CKEditor ResourceLink plugin (which is in a different repo, https://github.com/mitodl/ckeditor5). We probably should change the attribute name from data-uuid to something else, like data-link (which ironically might have been the original name in the Link plugin, I'm not sure).

If that sounds weird...

My understanding from talking with @alicewriteswrongs is: In CKEditor, links are modeled as an attribute on text and text in CKEditor can only have one attribute. For our resource link stuff, the value of that attribute is determined by the data-uuid attribute on the anchor tag.

# to/from CKEditor:
Markdown --> HTML --> CKEditor internal representation
         <--      <--
1. convert shortcode to text + metadata (metadata = uuid + anchor)
2. store metadata on anchor data-uuid attribute
3. CKEditor uses the data-uuid attribute as the single attribute for the
   link text in its internal representation

The important thing is: In CKEditor's model of a link, it can store the text and one other value. (No more than one other value). For normal links, that "one value" is the url. For our resource_links, that one value was previously the uuid.

But now we need to store two pieces of information in that one value: the uuid and the fragment id.

The way ResourceLink plugin is set up, the "one value" is the same as "data-uuid" attribute on the <a></a> tag.

In Alice's PR (#1037) the two pieces of information were encoded as: uuid#fragment, i.e., "piece1#piece2" with '#'as the separator. That was really simple but (1) makes including. the '#' piece in the shortcode value slightly harder and also precludes us from using this for search params?foo=1&bar=2` if we ever needed to.

So in this PR, I switched the encoding to just use JSON.stringify/parse to encode/decode the multiple pieces (uuid+anchor) to/from a single value. Except the output of JSON.stringify be an HTML attribute value since it has quotations, so I had to use encodeURIComponent/decode....

This works well for passing 2 (or more) args as a single value to/from CKEditor.

All I meant in my original comment was: It also works for passing 1 argument, but it's kinda ugly because the single argument also has the quotations and square brackets from JSON.stringify:

const uuid = 'uuid1'
const asJson = JSON.stringify(['uuid1']) // stringify an array for consistency with the two-argument case
// out: the string '["uuid1"]'
const encoded = encodeURIComponent(asJson)
// out: the string '%5B%22uuid1%22%5D'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should call the attr data-link-attrs or something like that. If we want to change it we just need to release a new version of our forked link package with the change, and then introduce a change here in Studio which upgrades to that version and changes our resource link markdown code - since this stuff is all in-memory CKEditor stuff we shouldn't break anything by doing that, all this wrangling just has to do with basically how to pass this data through CKEditor when the content is actually open in the editor.

As far as the aesthetic aspects of using the JSON thing for one argument, I have no objections to the approach you outlined. What we need is something that lets us losslessly map one or two short strings to a single string and back again, I think JSON.stringify | encodeURIComponent should give us that.

Copy link
Contributor Author

@ChristopherChudzicki ChristopherChudzicki Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopherChudzicki thanks for the explanation, I now have a greater understanding of how this works and what you meant by a single argument being "ugly." I don't think it matters as long as the data comes out as expected on the other side.

Copy link
Contributor

@gumaerc gumaerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

This looks good now, although we should consider renaming $anchor_id in resource_link.html in ocw-hugo-themes to $queryString instead. I just have one question about one of your comments for my own clarification:

Comment on lines -78 to +87
'<p><a class="resource-link" data-uuid="uuid123">bad &lt;/a&gt;</p>'
`<p><a class="resource-link" data-uuid="${encode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this, what you're saying is that because it's encoding both class and data-uuid that it will include the quotations for both as part of the string.?

@ChristopherChudzicki
Copy link
Contributor Author

I am going to merge this, but will wait for #1047 until actually running on rc

@ChristopherChudzicki ChristopherChudzicki merged commit 169909f into master Feb 24, 2022
@odlbot odlbot mentioned this pull request Feb 24, 2022
4 tasks
ChristopherChudzicki added a commit that referenced this pull request Feb 25, 2022
In #1036 we began encoding the data-uuid argument for < resource_link > shortcodes. The encoding / decoding worked fine for converting between HTML and Markdown.

But in that PR I forgot to encode the data we passed to CKEditor when creating new links.
ChristopherChudzicki added a commit that referenced this pull request Feb 25, 2022
In #1036 we began encoding the data-uuid argument for < resource_link > shortcodes. The encoding / decoding worked fine for converting between HTML and Markdown.

But in that PR I forgot to encode the data we passed to CKEditor when creating new links.
@odlbot odlbot mentioned this pull request Feb 25, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants