Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync from upstream katef/libfsm main #27

Merged
merged 198 commits into from
Oct 9, 2024
Merged

Sync from upstream katef/libfsm main #27

merged 198 commits into from
Oct 9, 2024

Conversation

silentbicycle
Copy link

Sync from upsrteam, subsequent changes will depend on interfaces added here.

silentbicycle and others added 30 commits May 15, 2024 15:09
queue: Fix a read past the end of the queue.
These are used for reporting errors, and the only error here is reported from within the <count-range> action. In fact I think all errors involving lexical positions are reported within actions (i.e. during parse), because they are all syntax errors. And so I think we never need to annotate the AST with positions.
This also removes struct ast_pos, with no remaining uses.
Remove lexical position annotations for AST nodes
My thinking here is that we don't need to categorise these independently from anything else we consider unsupported, because to the caller the reason something is unsupported doesn't matter. This way there's only one situation for a caller to keep track of, and in particular to not need to remember to update whenever we introduce more unsupported things.
This also means that many existing tests will exercise fsm_vacuum.
Consolidate RE_EUNSUPPORTED syntax errors
Add fsm_vacuum, which reduces the state array when over-allocated.
This is a follow-up to katef#465:

- Also add the extra `void *opaque` to vmc's codegen.

- Add a `(void)` cast to suppress warnings if the extra opaque
  void pointer isn't used by the generated code.
…nd-warning

Add `void* opaque` to vmc codegen too, disable unused warning.
This matches the standard behaviour. I'd disallowed it to avoid confusion, but in rx I do actually want to realloc down to nothing and return NULL when size is 0. And that's what free() does, but it seems cumbersome to have a conditional around that in the caller.
There's no need to have xstrdup() behave differently here, it's just confusing.
katef and others added 26 commits August 25, 2024 20:43
This doesn't help for katef#317, but whatever the solution is there, asserting about it is the wrong thing to do.

Spotted by @classabbyamp, thank you
I have extremely broken the iprange example. I am very unsure what's going on with this program.
Spotted by Dan Kegel, thank you.
rx, a program for compiling sets of regular expressions
Most of these are used in fsm_detect_required_characters.

Also add #includes for standard headers being used.
This inspects the DFA to determine which characters must appear in
any matching input.
fsm_endid_get's id_buf_count argument is expected to "have enough
cells (according to id_buf_count)", but if it has more than enough,
stale data can get sorted into the result.

Add a test, tests/endids/endids_reused_buffer.c
bugfix: fsm_endid_get should sort with result count, not buffer size.
Add a regression test showing possible endid false negatives when FSMs were trimmed
(called from fsm_minimise) without updating endids.
`struct bm` isn't part of the public API, so use a uint64_t[4]
instead, and add an optional parameter for the count.

Update the tests.
Move the end_id array and its count into a struct for state metadata,
and rename access throughout to end_ids and end_id_count.

Upcoming changes for eager output IDs will soon be passing more info to
all of these callbacks, but only callers making use of those fields need
to care. Instead of making callers add more `(void) param;` declarations
all over the place to avoid warnings, just pass in a metadata struct
pointer. Also, "count" is a pretty generic name and what it refers to
will soon be ambiguous.

This should not be a functional change on its own.
It's only used for the assertion.
It's already in fsm/fsm.h.
The IR struct is about to get another id & count pair.

This loses storing the count as a 31-bit bitfield, but if the
goal for that is saving memory then the ids array allocation could
be replaced with a struct that contains the count, and then each
IR without endids will save more space than the current approach.
…-must-appear-in-input-to-match

Add fsm_detect_required_characters
…ld-remap-endids

fsm_compact_states must remap endids, to avoid dangling references.
…data-args-into-struct

print API: Box end_ids and end_id_count in a struct for callbacks.
@silentbicycle silentbicycle requested a review from katef October 9, 2024 21:19
@katef katef merged commit 8b53634 into main Oct 9, 2024
349 checks passed
@katef katef deleted the sv/upstream-main branch October 9, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants