forked from katef/libfsm
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync from upstream katef/libfsm main #27
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
queue: Fix a read past the end of the queue.
Add fsm_intersect_charset(), fsm -U
These are used for reporting errors, and the only error here is reported from within the <count-range> action. In fact I think all errors involving lexical positions are reported within actions (i.e. during parse), because they are all syntax errors. And so I think we never need to annotate the AST with positions.
This also removes struct ast_pos, with no remaining uses.
Remove lexical position annotations for AST nodes
My thinking here is that we don't need to categorise these independently from anything else we consider unsupported, because to the caller the reason something is unsupported doesn't matter. This way there's only one situation for a caller to keep track of, and in particular to not need to remember to update whenever we introduce more unsupported things.
This also means that many existing tests will exercise fsm_vacuum.
Consolidate RE_EUNSUPPORTED syntax errors
Add fsm_vacuum, which reduces the state array when over-allocated.
Add fsm_new_statealloc()
This is a follow-up to katef#465: - Also add the extra `void *opaque` to vmc's codegen. - Add a `(void)` cast to suppress warnings if the extra opaque void pointer isn't used by the generated code.
…nd-warning Add `void* opaque` to vmc codegen too, disable unused warning.
This matches the standard behaviour. I'd disallowed it to avoid confusion, but in rx I do actually want to realloc down to nothing and return NULL when size is 0. And that's what free() does, but it seems cumbersome to have a conditional around that in the caller.
There's no need to have xstrdup() behave differently here, it's just confusing.
Add xstrndup
This doesn't help for katef#317, but whatever the solution is there, asserting about it is the wrong thing to do. Spotted by @classabbyamp, thank you
I have extremely broken the iprange example. I am very unsure what's going on with this program.
A few small bugfixes
rx, a program for compiling sets of regular expressions
Happier cache lines
Most of these are used in fsm_detect_required_characters. Also add #includes for standard headers being used.
This inspects the DFA to determine which characters must appear in any matching input.
fsm_endid_get's id_buf_count argument is expected to "have enough cells (according to id_buf_count)", but if it has more than enough, stale data can get sorted into the result. Add a test, tests/endids/endids_reused_buffer.c
bugfix: fsm_endid_get should sort with result count, not buffer size.
Update to Unicode 16.0
Add a regression test showing possible endid false negatives when FSMs were trimmed (called from fsm_minimise) without updating endids.
`struct bm` isn't part of the public API, so use a uint64_t[4] instead, and add an optional parameter for the count. Update the tests.
Move the end_id array and its count into a struct for state metadata, and rename access throughout to end_ids and end_id_count. Upcoming changes for eager output IDs will soon be passing more info to all of these callbacks, but only callers making use of those fields need to care. Instead of making callers add more `(void) param;` declarations all over the place to avoid warnings, just pass in a metadata struct pointer. Also, "count" is a pretty generic name and what it refers to will soon be ambiguous. This should not be a functional change on its own.
It's only used for the assertion.
It's already in fsm/fsm.h.
The IR struct is about to get another id & count pair. This loses storing the count as a 31-bit bitfield, but if the goal for that is saving memory then the ids array allocation could be replaced with a struct that contains the count, and then each IR without endids will save more space than the current approach.
…-must-appear-in-input-to-match Add fsm_detect_required_characters
…ld-remap-endids fsm_compact_states must remap endids, to avoid dangling references.
…data-args-into-struct print API: Box end_ids and end_id_count in a struct for callbacks.
katef
approved these changes
Oct 9, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sync from upsrteam, subsequent changes will depend on interfaces added here.