Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bold and spans with styles not handled #5

Open
jodamo5 opened this issue Apr 15, 2020 · 4 comments
Open

Bold and spans with styles not handled #5

jodamo5 opened this issue Apr 15, 2020 · 4 comments
Assignees

Comments

@jodamo5
Copy link

jodamo5 commented Apr 15, 2020

Thank you very much for creating this library. It is extremely good!

I have found a problem when when making text bold or colored. Hoping this can be fixed.

Issues with Bold Text

I made a string of the translated text bold. On the front end it made both the multiple lines of text bold (the original text plus the translated text bold).

Using bold for multiple words also throws out the alignment of the words slightly.

Example

As an example here is the original result with no bold used (so you can see how the alignment is supposed to look):
leipzig_-no_bold-_result

Then here is a screenshot showing the text that I made bold in the editor:

leipzig_-_bold_editor1

This was by just wrapping the bold text in <strong></strong> tags.

Here is a screenshot of the outcome:

leipzig_-bold-_result1

Points to note:

  1. The bold starts with "Samaria" (which matches what we had in the editor) but then the next word in the original language is now bold too.
  2. After the final word bold English word the next word in the original language is also bold and is now out of alignment with the translated (English) text.

Demo Examples

To make it easy to replicate, the same issue can be seen on the http://bdchauvette.net/leipzig.js/demo/ page with these examples:

One Word Bold - Works Fine

If only one word is made bold then the bold works fine:

leipzig-demo-one-word-bold
Link to this demo

More than one word bold - Issue Occurs

But as soon as more than one word is bold, the problem happens:
leipzig-demo-two-words-bold
Link to this demo

Grouped words in Bold - More Problems

And if words are grouped with brackets inside the strong tags it really messes up - it ignores then meaning of the brackets, so the words go out of alignment, and shows the brackets on the front end:
leipzig-demo-grouped-words-bold
Link to this demo

Issues with Coloured Text

If a section of the text is set as a different colour, using a <span> tag with an inline style, then the style code is outputted on the front end.

leipzig-text-colour-issue
Link to this demo

Ideal Outcome

Ideally, a string of text could be bold or coloured and only the text selected as bold or coloured in the backend would output on the front end.

As a fallback, if it is too much work to fix the text colour options, then at least bold could be used, and other html tags with styles would be stripped out and not outputted onto the frontend.

@bdchauvette
Copy link
Owner

Hi @jodamo5! Thanks so much for the detailed issue!

The underlying issue tying all these behaviors together is that library is not very smart about parsing the text. It's not aware of HTML at all, and currently parses the text as normal strings, then constructs a string and (dangerously 😬) creates the output HTML by setting the output string as the output element's innerHTML.

I'll provide some more info for each of your examples.

One Word Bold - Works Fine

The reason that this works fine is because there are no spaces for the engine to analyze as separate words.

More than one word bold - Issue Occurs

This issue is similar to the problem with attributes. As far as the lexer is concerned, the items end up being something like:

<span>ein</span>
<span><strong>klein</span>
<span>-es</strong></span>

The only workaround I can think of for this is to wrap each separate word in its own <strong> tag, which is unfortunately tedious.

Grouped words in Bold - More Problems

This one is luckily somewhat easy to fix: you need to put the grouping outside of the tags, so the parser is able to tell it's a group, i.e.

ein kleines Beispiel 
ein {<strong>klein-es test</strong>} Beispiel
DET.NOM.N.SG small-AGR.NOM.N.SG example 
"a small example"

demo

Because the engine isn't aware of HTML, if the first character in a word isn't a grouping character, it doesn't realize that it's supposed to be a group, and it ends up parsing it as normal text. If the grouping character is first, with the HTML inside it, then everything works out.

Issues with Coloured Text

This is similar to issue with more than one word bolded: the engine parses the attributes as separate items, and the resulting HTML is something. If you only want a single word colored, you can wrap the whole item in a group, then put the tag, e.g.:

{<span style="color: #00ccff;">διῆλθον</span>}
{passed through...}

demo

If a group of words need to be colored, just make sure to the tags inside the group.

If multiple words need to be colored but can't be grouped, you'll unfortunately have to repeat the single-word grouping trick for each element 😞

{<span style="color: #00ccff;">Οἱ</span>} {<span style="color: #00ccff;">μὲν</span>} {<span style="color: #00ccff;">οὖν</span>} {<span style="color: #00ccff;">διασπαρέντες</span>} ἀπὸ τῆς θλίψεως τῆς γενομένης ἐπὶ Στεφάνῳ {<span style="color: #00ccff;">διῆλθον</span>}

The then therefore {being scattered} from the affliction the happening concerning Stephen {passed through...}

demo

You could potentially use some CSS to make this less annoying, maybe by using a specific tag (<em>, <strong>, <i>, etc.), and styling it differently when it's inside a gloss. That would let you fall back to just wrapping each word in a tag, without having to also make it a group.

.gloss__line em {
  color: #0cf;
}
<em>Οἱ</em> <em>μὲν</em> <em>οὖν</em> <em>διασπαρέντες</em> ἀπὸ τῆς θλίψεως τῆς γενομένης ἐπὶ Στεφάνῳ <em>διῆλθον</em>

The then therefore {being scattered} from the affliction the happening concerning Stephen {passed through...}

demo (but you'll have to add the styling yourself in the browser dev tools)


That being said, I think the parser should be more aware of HTML, and we should definitely not be blindly setting the innerHTML for constructing output. Writing an engine that's aware of HTML is unfortunately a lot more work than just doing it for strings, so it'll take me a while to implement. I hope that the workarounds I've put here can help you out in the meantime, even if they're certainly not ideal.

Let me know if you have any more questions!

@jodamo5
Copy link
Author

jodamo5 commented Apr 20, 2020

Thanks @bdchauvette

That's great to see the easy solution for the grouped bold words of putting the { } on the outside of the <strong> tags, as we'll be able to achieve that in a WYISIWYG editor too.

Rather than fully understanding HTML I wonder if a more simple approach could achieve improved reliability while also giving bold or <em> functionality ...

Phase 1

Before glossing use javascript to strip all HTML tags except <strong> and <em>. This would fix reliability issues caused by text colour spans immediately. And would still enable us to do single bold words, etc.

Phase 2

As an additional enhancement the inner element of any <strong> or <em> tag could be searched for a space. If a space is found then replace with a closing tag a space and an opening tag.
e.g. <strong>test word</strong> would change to
<strong>test</strong> <strong>word</strong>
and then get glossed.

I think phase 1 would be easy enough. I don't know how difficult phone 2 would be, but I think it would be easier than getting it to process full HTML.

What do you think?

@bdchauvette
Copy link
Owner

For Phase 1, I definitely agree that we should be glossing based on the raw, unformatted text. I haven't tried this yet, but we should be able to use the innerText property for this (vs. innerHTML like we're currently doing).

For Phase 2, I'd like to avoid a solution that hard codes support for only certain tags. Ideally, we'd be able to reapply any HTML elements to the parsed gloss. I think we should be able to do this by walking the original DOM tree, and mapping the innerText of leaf nodes to the elements of the glossed text. For each matching text segment, we would have to wrap the glossed text in the leaf node, and all of the parent nodes up to the original gloss element. For example, if we have nested tags <em><strong>foo</strong> bar</em>, we would want to apply both the <em> and <strong> tags to foo.

Alternatively, it might be cleaner to do the tree walking first, and build an AST-like structure that contains the various elements as we go. Once we have the AST, we could use that to render out the final gloss.

@jodamo5
Copy link
Author

jodamo5 commented Apr 25, 2020

Yes, I agree. Your outline of Phase 2 sounds ideal. And Phase 1 will be a great interim step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants