-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Table attributes are parsed as Text #285
Comments
You are doing a recursive filter on the wikicode. Those text nodes are nested inside the tag nodes. The tree of your snippet looks correct: >>> print(code.get_tree())
<
table
class
= wikitable
>
<
tr
>
<
th
style
= width:101px
>
heading1\n
</
th
>
<
th
style
= width:102px
>
heading2\n
</
th
>
</
tr
>
<
tr
data-test
= whatever
>
<
td
style
= width:201px
>
testing\n
</
td
>
</
tr
>
</
table
> The table attributes are parsed as text because MediaWiki parses table attributes before producing the table. For example, table attributes can contain templates and other wikicode. So I would say this is not a bug in mwparserfromhell. |
I don't insist in calling it a bug, but still it doesn't make sense when parsing stuff with a bot. When I want change something in texts I explicitly don't want to mess with table attributes. |
Then you need to adjust your recursive iteration. For example: for node in code.filter():
parent = code.get_parent(node)
if isinstance(parent, mwparserfromhell.nodes.tag.Tag) and not parent.contents.contains(node):
print(f"parent of '{node}' is a tag and the node is not in the tag's contents, skipped")
continue
if isinstance(node, mwparserfromhell.nodes.tag.Tag):
print(f'[{node.__class__.__name__}]({node.tag})\n"""\n', node, '\n""""')
else:
print(f'[{node.__class__.__name__}]\n"""\n', node, '\n""""') |
@lahwaacz is right, but maybe there's room here for improvement. We might want to distinguish between Text nodes that are "visible"/rendered and Text nodes that are purely internal like template parameter names and tag attributes. It's basically the problem being solved by |
There seem to be something weird going on when parsing Tables. I get attributes as Text nodes.
Let's consider this example:
This how the text is rendered by MW:
So there are 3 cells here. In HTML textContent would be heading1, heading2, testing.
But in MWPFH I get Text nodes for attributes. For example:
I would expect those attributes wouldn't appear as Text nodes at all. They are already available
node.attributes
so that should be enough.So when traversing above cell, it should only show:
The text was updated successfully, but these errors were encountered: