Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils.astring: A couple of strip_console_codes fixes #135

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ldoktor
Copy link
Contributor

@ldoktor ldoktor commented Aug 19, 2024

I noticed the strip_console_code works quite oddly, let's add a selftest and fix the biggest issues.

Anyway the behaviour was here for a long time so this is here just to begin some discussion.

do not merge before a proper discussion!

we already set the "index += len(tmp_word) + 1" before we get to this
exception so we must not add the tmp_index.

Signed-off-by: Lukáš Doktor <[email protected]>
previously we only checked the console_code is found anywhere in the
text between the two \x1b and then we blindly removed first N
characters. Let's ensure we check the console code regexp from the
beginning.

Signed-off-by: Lukáš Doktor <[email protected]>
@ldoktor ldoktor marked this pull request as draft August 19, 2024 08:55
@smitterl
Copy link

Though I might be misunderstanding but matching only the start of a line doesn't seem to work, ref. #134 (comment)

these are sometimes generated by kernel, ignore them.

Signed-off-by: Lukáš Doktor <[email protected]>
Signed-off-by: Sebastian Mitterle <[email protected]>
@ldoktor
Copy link
Contributor Author

ldoktor commented Sep 9, 2024

@smitterl I have added a commit that should address the #134, could you please take a look on whether it behaves correctly and if not how to modify the selftest to cover your scenario? Then you can port it to your PR to get things rolling and later hopefully refine this as well.

@smitterl
Copy link

smitterl commented Dec 3, 2024

@ldoktor Sorry for the late response. With this state I get an IndexError in aexpect.

[stdlog] 2024-12-03 09:14:17,560 avocado.test stacktrace       L0042 ERROR| Reproduced traceback from: /var/ci/libvirt-ci/runtest/avocado-vt/avocado-vt/avocado_vt/test.py:298
[stdlog] 2024-12-03 09:14:17,562 avocado.test stacktrace       L0049 ERROR| Traceback (most recent call last):
[stdlog] 2024-12-03 09:14:17,562 avocado.test stacktrace       L0049 ERROR|   File "/root/aexpect/aexpect/utils/astring.py", line 56, in strip_console_codes
[stdlog] 2024-12-03 09:14:17,562 avocado.test stacktrace       L0049 ERROR|     special_code = re.findall(console_codes, tmp_word)[0]
[stdlog] 2024-12-03 09:14:17,562 avocado.test stacktrace       L0049 ERROR|                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
[stdlog] 2024-12-03 09:14:17,563 avocado.test stacktrace       L0049 ERROR| IndexError: list index out of range

@ldoktor
Copy link
Contributor Author

ldoktor commented Dec 5, 2024

@ldoktor Sorry for the late response. With this state I get an IndexError in aexpect.

[stdlog] 2024-12-03 09:14:17,560 avocado.test stacktrace       L0042 ERROR| Reproduced traceback from: /var/ci/libvirt-ci/runtest/avocado-vt/avocado-vt/avocado_vt/test.py:298
[stdlog] 2024-12-03 09:14:17,562 avocado.test stacktrace       L0049 ERROR| Traceback (most recent call last):
[stdlog] 2024-12-03 09:14:17,562 avocado.test stacktrace       L0049 ERROR|   File "/root/aexpect/aexpect/utils/astring.py", line 56, in strip_console_codes
[stdlog] 2024-12-03 09:14:17,562 avocado.test stacktrace       L0049 ERROR|     special_code = re.findall(console_codes, tmp_word)[0]
[stdlog] 2024-12-03 09:14:17,562 avocado.test stacktrace       L0049 ERROR|                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
[stdlog] 2024-12-03 09:14:17,563 avocado.test stacktrace       L0049 ERROR| IndexError: list index out of range

This is interesting, are you absolutely sure you're using this version of aexpect? The line 56 is inside try/except so IndexError shouldn't be possible, max it can raise ValueError with a sensible message. It'd be nice to catch the output, if possible so we can add this scenario to selftests.

@smitterl
Copy link

smitterl commented Jan 15, 2025

This is interesting, are you absolutely sure you're using this version of aexpect? The line 56 is inside try/except so IndexError shouldn't be possible, max it can raise ValueError with a sensible message. It'd be nice to catch the output, if possible so we can add this scenario to selftests.

Yes, I'm sure, I just tried again.

@smitterl
Copy link

@clebergnu @ldoktor I finally found a reproducer that works also on my laptop and doesn't require VMs. Please check #133 (comment)

@ldoktor
Copy link
Contributor Author

ldoktor commented Jan 16, 2025

Thanks for the reproducer \x1b[!p\x1b]104\x07\x1b[?7h. It seems at least not to crash with this version, but it removes the whole line except of \x07. This is because it contains yet another new code \x1b[?7h (and \x07 which might or might not be part of the \x1b]104\x07?) and therefore results in matching the remaining output.

To address the \x1b[?7h we could either list it or expect it to be a group of possible outputs and use something like |^\\[\\?\\d+h to match any \d followed by h. What do you think?

I'd suggest something like this:

diff --git a/aexpect/utils/astring.py b/aexpect/utils/astring.py
index 34936ff..907f4d1 100644
--- a/aexpect/utils/astring.py
+++ b/aexpect/utils/astring.py
@@ -37,7 +37,7 @@ def strip_console_codes(output, custom_codes=None):
     output = f"\x1b[m{output}"
     console_codes = "^%[G@8]|^\\[[@A-HJ-MPXa-hl-nqrsu\\`]"
     console_codes += "|^\\[[\\d;]+[HJKgqnrm]|^#8|^\\([B0UK]|^\\)"
-    console_codes += "|^\\[\\?2004[lh]|^c|^\\[!p|^\\]\\d+"
+    console_codes += "|^\\[\\?2004[lh]|^c|^\\[!p|^\\]\\d+|^\\[\\?\\d+h"
     if custom_codes is not None and custom_codes not in console_codes:
         console_codes += f"|{custom_codes}"
     while index < len(output):

which keeps the \x07 but since it's not colliding I'd rather keep it there as matching any hex number might be problematic...

@smitterl
Copy link

smitterl commented Jan 17, 2025

Thanks for the reproducer \x1b[!p\x1b]104\x07\x1b[?7h. It seems at least not to crash with this version, but it removes the whole line except of \x07. This is because it contains yet another new code \x1b[?7h (and \x07 which might or might not be part of the \x1b]104\x07?) and therefore results in matching the remaining output.

To address the \x1b[?7h we could either list it or expect it to be a group of possible outputs and use something like |^\\[\\?\\d+h to match any \d followed by h. What do you think?

From XTerm Control Sequences and xterm.js supported control sequences I deduce that \x1b[?7h corresponds to CSI command SM = CSI Pm h setting terminal mode where Pm in [2, 4, 12, 20], 7 isn't listed on that reference so I assume that effectively we can't be sure beforehand to cover all possible numbers. Overviewing the definition of other CSI sequences I conclude that almost any number can occur, therefore I think your solution adding |^\\[\\?\\d+h should be preferred.

\x1b]104\x07 is not listed as such by the xterm.js supported control sequences and could therefore be an erronoues intent to reset terminal color (OSC 104 BEL) but I wouldn't want to break our testing because a wrong code is used or I don't find the correct reference. I would want to remove \x07 in this specific case because for me the whole sequence is clearly a control sequence.

It raises for me somewhat the question if strip_console_codes should remove all control characters or even try to validate them. I assume we don't want to validate them. This for me is yet another argument to also remove \x07.

I'd suggest something like this:

diff --git a/aexpect/utils/astring.py b/aexpect/utils/astring.py
index 34936ff..907f4d1 100644
--- a/aexpect/utils/astring.py
+++ b/aexpect/utils/astring.py
@@ -37,7 +37,7 @@ def strip_console_codes(output, custom_codes=None):
     output = f"\x1b[m{output}"
     console_codes = "^%[G@8]|^\\[[@A-HJ-MPXa-hl-nqrsu\\`]"
     console_codes += "|^\\[[\\d;]+[HJKgqnrm]|^#8|^\\([B0UK]|^\\)"
-    console_codes += "|^\\[\\?2004[lh]|^c|^\\[!p|^\\]\\d+"
+    console_codes += "|^\\[\\?2004[lh]|^c|^\\[!p|^\\]\\d+|^\\[\\?\\d+h"
     if custom_codes is not None and custom_codes not in console_codes:
         console_codes += f"|{custom_codes}"
     while index < len(output):

which keeps the \x07 but since it's not colliding I'd rather keep it there as matching any hex number might be problematic...

@ldoktor
Copy link
Contributor Author

ldoktor commented Jan 20, 2025

Well I'd prefer removing all problematic ones, meaning the ones that might break output postprocessing. What do you think? Also about the rest of this PR, should I update it and move from draft?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants