-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mozpsl fixes from @morkrost #24
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes involve updates to the Changes
Possibly related PRs
Suggested reviewers
Poem
Warning Rate limit exceeded@jschlyter has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 42 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
tests/test_dns_mozpsl.py (1)
29-29
: Consider adding more IDN test casesWhile the trailing period addition is correct, the test suite could benefit from additional IDN test cases to ensure robust handling of various internationalized domain scenarios.
Consider adding test cases for:
- Right-to-left scripts (e.g., Arabic domains)
- Mixed-script domains
- IDN TLDs (e.g., .рф, .中国)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
dnstapir/dns/mozpsl.py
(3 hunks)tests/test_dns_mozpsl.py
(2 hunks)
🔇 Additional comments (6)
tests/test_dns_mozpsl.py (2)
12-15
: LGTM! Consistent FQDN formatting
The updated assertions correctly expect trailing periods in the output, which aligns with DNS standards for Fully Qualified Domain Names (FQDNs).
16-19
: Verify consistent handling of complex subdomains
While the changes correctly add trailing periods to both parts of the tuple, let's verify that similar complex subdomain patterns are handled consistently across the codebase.
✅ Verification successful
Complex subdomain handling is consistent across test cases
The test cases show consistent handling of trailing periods across various complex domain patterns:
- Simple domains (e.g., "www.ck.")
- Government domains (e.g., "www.something.gov.ck.")
- Multi-level subdomains (e.g., "www.something.or.other.microsoft.com.")
- International domains (e.g., "microsoft.com.br.")
- IDN domains (e.g., "www.xn--mnchen-3ya.de.")
- AWS complex subdomains (e.g., "something.emrstudio-prod.us-gov-east-1.amazonaws.com.")
The AWS subdomain test case follows the same pattern as other complex domain tests, maintaining consistency in the handling of trailing periods and subdomain extraction.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for other complex subdomain patterns to ensure consistent handling
# Look for other test cases with similar patterns
rg -l "\.amazonaws\." --type py
# Look for actual usage patterns in the codebase
rg "coredomain\([^)]*amazonaws" --type py
Length of output: 219
Script:
#!/bin/bash
# Let's check for any other similar complex domain handling tests and patterns
# Look for other test cases with complex domain patterns
rg "coredomain\([^)]*\." -A 2 --type py
# Check the implementation of coredomain to understand the handling
ast-grep --pattern 'def coredomain($$$)'
Length of output: 1139
dnstapir/dns/mozpsl.py (4)
12-12
: LGTM: Improved type annotation clarity
The change from Self
to explicit TrieNode
type makes the code more maintainable and clearer to understand.
56-63
: LGTM: Improved control flow and documentation
The restructured control flow is more logical:
- Check label existence first
- Update current node
- Handle ICANN status updates
The added comment about None
handling improves code clarity.
134-150
: Verify PSL standards compliance for trailing periods
The changes improve code readability and ensure consistent output formatting. However, please verify that always adding a trailing period aligns with PSL standards and expected behavior.
✅ Verification successful
Based on the gathered evidence, I can now provide the final response:
Trailing periods in domain output are correct and verified
The implementation is correct and consistent with PSL standards:
- Test cases consistently expect and verify trailing periods in the output (
tests/test_dns_mozpsl.py
) - The
coredomain
method strips trailing periods from input (domain.rstrip(".")
) but adds them back in output, showing intentional handling - All test cases, including edge cases and IDN domains, validate this behavior
The changes maintain consistency with the existing test suite and implementation patterns.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check test cases and documentation for trailing period requirements
echo "Checking test cases for trailing period handling:"
rg -A 5 "coredomain.*\." tests/
echo "Checking documentation or comments about trailing periods:"
rg -A 5 "trailing.*period|suffix.*format" .
Length of output: 1753
Script:
#!/bin/bash
# Let's check the implementation and documentation of the PSL library
echo "Checking PSL implementation details:"
rg -A 10 "class PSL" .
# Check if there are any comments or documentation about domain format
echo "Checking domain format documentation:"
rg -B 2 -A 5 "def coredomain" .
# Look for any README or documentation files
echo "Checking documentation files:"
fd -e md -e rst -x cat {}
Length of output: 883
50-51
: Verify the intentional removal of wildcard support
The commented code suggests there was previously support for wildcard domains (*
). Please confirm if this removal was intentional, as it might affect the PSL parsing capabilities.
✅ Verification successful
Let me analyze the test files and implementation to understand if wildcard support was intentionally removed or if it's handled differently.
Wildcard handling has been intentionally removed from search but remains in PSL loading
The codebase shows that wildcard handling is still present in the PSL loading logic (if line[0] == "*"
) but has been intentionally commented out in the Trie search method. This appears to be a deliberate change as:
- The test cases don't include any wildcard domain scenarios
- The PSL loading still handles wildcards by stripping them (
line[2:]
) - The search method's control flow has been simplified to focus on exact matches
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if wildcard domains are still being processed in the PSL loading
rg -A 5 "^\s*\*" tests/
Length of output: 25
Script:
#!/bin/bash
# Let's check the test files and implementation for any wildcard handling
rg -l "wildcard|[^a-zA-Z]\*[^a-zA-Z]" dnstapir/dns/
# Check test files for PSL related tests that might show wildcard usage
fd -e py "test.*psl" tests/
# Look for PSL test data files
fd "public_suffix" .
# Check if there are any PSL test data files with wildcards
fd -t f . -x cat {} \; | rg "^\*\."
Length of output: 193
Script:
#!/bin/bash
# Let's examine the content of the test file and mozpsl implementation
cat tests/test_dns_mozpsl.py
# Check the implementation file for context around wildcard handling
rg -B 3 -A 3 "\*" dnstapir/dns/mozpsl.py
Length of output: 1857
ping @morkrost |
Summary by CodeRabbit
New Features
Bug Fixes
coredomain
method, ensuring consistency in results.