Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaw in Python re.split #10

Open
funderburkjim opened this issue May 28, 2024 · 0 comments
Open

Flaw in Python re.split #10

funderburkjim opened this issue May 28, 2024 · 0 comments

Comments

@funderburkjim
Copy link
Contributor

During the work under #9, a flaw was discovered in the re.split() function of Python.
A solution was found by a conversation with Copilot on Windows 11.
See splitproblem for some details.
The function 'split1' in test1.py provides what looks like a quite general replacement for re.split(regex,text,re.DOTALL) using re.findIter().

To see the problem,

python test1.py 5 # no problem for any number from 1 through 16.
python test1.py 17 # shows the problem.  Problem occurs for inputs > 16.

The code was run with Python version 3.9.1.

I hope Python will find some way of identifying (perhaps by an exception?) when re.split(regex,text,re.DOTALL) gives THE WRONG ANSWER.

AFAIK: The problem (wrong answer) occurs when

  • there is a capture group in the regex
  • the input text has \n line breaks (number of breaks doesn't matter)
  • The input text matches the capture group more than 16 times.
    • if there are no line breaks, then re.split works fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant