Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bindgen: use bindgen to provide Rust bindings to C - v8 #12466

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jasonish
Copy link
Member

@jasonish jasonish commented Jan 23, 2025

A simpler, more focused version of #12461.

  • Straight to suricata_sys crate, remove intermediate commits.
  • Only export app-layer-protos.h and update Rust code to use the generated bindings.
  • Don't both sorting out other circular dependencies.
  • Code changes to use AppProto could have been more more minimal by re-exporting from core, but I don't like the idea of add re-exports.

Follow Rust convention of using a "sys" crate for bindings to C
functions. The bindings don't exist yet, but will be generated by
bindgen and put into this crate.
Bindgen works by processing a header file which includes all other
header files it should generate bindings for. For this I've created
bindgen.h which just includes app-layer-protos.h for now as an
example.

These bindings are then generated and saved in the "suricata-sys"
crate and become availale as "suricata_sys::sys".
Have bindgen generate bindings for app-layer-protos.h, then use the
generated definitions of AppProto/AppProtoEnum instead if defining
them ourselves.

This header was chosen as its used by Rust, and its a simple header
with no circular dependencies.
Comment on lines +1 to +7
#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum AppProtoEnum {
ALPROTO_UNKNOWN = 0,
ALPROTO_FAILED = 1,
ALPROTO_HTTP1 = 2,
ALPROTO_FTP = 3,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also do:

pub const AppProtoEnum_ALPROTO_UNKNOWN: AppProtoEnum = 0;
pub const AppProtoEnum_ALPROTO_FAILED: AppProtoEnum = 1;
pub const AppProtoEnum_ALPROTO_HTTP1: AppProtoEnum = 2;
pub const AppProtoEnum_ALPROTO_FTP: AppProtoEnum = 3;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it right to have a rust AppProtoEnum ? when we want dynamic AppProto cf #12420 and rejected https://redmine.openinfosecfoundation.org/issues/3524 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure? That PR still uses the AppProto datatype in SNMP. Make sense to bindgen that from C to Rust for use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I see AppProtoEnum as AppProto #12466 (comment)

Some comments around it may help like

  • AppProtoEnum is for statically defined protocols
  • AppProto is for all protocols : static and dynamic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is generated, so I guess that doc would go where these are defined in C.

Copy link

codecov bot commented Jan 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.52%. Comparing base (5d9fad5) to head (9844b39).
Report is 17 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12466      +/-   ##
==========================================
- Coverage   80.63%   80.52%   -0.12%     
==========================================
  Files         920      923       +3     
  Lines      258739   259176     +437     
==========================================
+ Hits       208643   208705      +62     
- Misses      50096    50471     +375     
Flag Coverage Δ
fuzzcorpus 56.06% <100.00%> (-0.76%) ⬇️
livemode 19.38% <0.00%> (-0.02%) ⬇️
pcap 44.21% <100.00%> (-0.07%) ⬇️
suricata-verify 63.31% <100.00%> (+0.07%) ⬆️
unittests 58.45% <33.33%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

Information: QA ran without warnings.

Pipeline 24340

}
pub type AppProto = u16;
extern "C" {
#[doc = " \\brief Maps the ALPROTO_*, to its string equivalent.\n\n \\param alproto App layer protocol id.\n\n \\retval String equivalent for the alproto."]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We do not need to export the comment I guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be disabled. But still useful to have to Rustdoc IMO.

@@ -79,14 +79,49 @@ clean-local:
distclean-local:
rm -rf vendor dist

check-bindgen-bindings:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a CI test with bindgen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I didn't pull over this test from the previous PR: https://github.com/OISF/suricata/pull/12461/files#diff-73e17259d77e5fbef83b2bdbbe4dc40a912f807472287f7f45b77e0cbf78792d

It just rebuilds the bindings to make sure they are up to date.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the CI test I am expecting :-)

pub const ALPROTO_UNKNOWN : AppProto = 0;
pub const ALPROTO_FAILED : AppProto = 1;
pub const ALPROTO_UNKNOWN : AppProto = AppProtoEnum::ALPROTO_UNKNOWN as AppProto;
pub const ALPROTO_FAILED : AppProto = AppProtoEnum::ALPROTO_FAILED as AppProto;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh AppProtoEnum. as AppProto...

@catenacyber
Copy link
Contributor

This looks nice overall.

What is the remaining work to do ?

  • list all C headers that should get into bindgen ?
  • add some pure rust stuff into suricata-sys crate ? like AppLayerTxData (including the JsonBuilder C API but feel-like-rust)
  • Add CI test

@jasonish
Copy link
Member Author

This looks nice overall.

What is the remaining work to do ?

  • list all C headers that should get into bindgen ?
  • add some pure rust stuff into suricata-sys crate ? like AppLayerTxData (including the JsonBuilder C API but feel-like-rust)
  • Add CI test

Its ready as is IMO.

But then it would probably make sense to try to replace any of our hand written extern "C" { blocks of C functions with bindgen generated ones. I woudln't attempt to do this all at once as it could be a very big job with lots of conflicts while try to keep up to date. Best tackled in smaller bits, and definitely no more additions of hand written extern "C" { blocks.

@jasonish
Copy link
Member Author

  • add some pure rust stuff into suricata-sys crate ? like AppLayerTxData (including the JsonBuilder C API but feel-like-rust)

Unfortunately Ubuntu 24.04 and the latest RHEL/Fedora/EPEL stuff ship with bindgen 0.69.5 which can't re-export our rust-bindings.h as it requires some header/struct/datatype that bindgen fails on - in detect.h specifically. But I don't know what. There are options, potentially a refactor of detect.h. Breaking out JsonBuilder into its own crate (perhaps even a utils crate with standalone rust stuff like JsonBuilder, hashing).

Or, just upgrade to the latest bindgen, 0.71.1. It has no issues consuming detect.h, rust-bindings.h.

Regenerates the `sys.rs` and looks for any difference. Check will fail
if there is a difference.
We don't keep bindgen's autogenerated do not edit line as it contains
the bindgen version which could break the CI check for out of date
bindings. So add our own do not edit line.
@victorjulien
Copy link
Member

  • add some pure rust stuff into suricata-sys crate ? like AppLayerTxData (including the JsonBuilder C API but feel-like-rust)

Unfortunately Ubuntu 24.04 and the latest RHEL/Fedora/EPEL stuff ship with bindgen 0.69.5 which can't re-export our rust-bindings.h as it requires some header/struct/datatype that bindgen fails on - in detect.h specifically. But I don't know what. There are options, potentially a refactor of detect.h. Breaking out JsonBuilder into its own crate (perhaps even a utils crate with standalone rust stuff like JsonBuilder, hashing).

Or, just upgrade to the latest bindgen, 0.71.1. It has no issues consuming detect.h, rust-bindings.h.

detect.h is quite large and somewhat messy. It could be worth trying to clean it up and break things out.

@catenacyber
Copy link
Contributor

Its ready as is IMO.

But then it would probably make sense to try to replace any of our hand written extern "C" { blocks of C functions with bindgen generated ones. I woudln't attempt to do this all at once as it could be a very big job with lots of conflicts while try to keep up to date. Best tackled in smaller bits, and definitely no more additions of hand written extern "C" { blocks.

Good to tackle smaller bits, but I wanted to see the next bits :
Do you think the suricata-sys crate should contain AppLayerTxData ? which is now defined in rust (and used by app-layer plugin)

Copy link
Contributor

@catenacyber catenacyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI check was missing ;-)

@@ -2269,6 +2269,9 @@ fi
fi
fi

AC_PATH_PROG([BINDGEN], [bindgen], [no])
AM_CONDITIONAL([HAVE_BINDGEN], [test "x$BINDGEN" != "xno"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check bindgen version ?

@jasonish
Copy link
Member Author

Its ready as is IMO.
But then it would probably make sense to try to replace any of our hand written extern "C" { blocks of C functions with bindgen generated ones. I woudln't attempt to do this all at once as it could be a very big job with lots of conflicts while try to keep up to date. Best tackled in smaller bits, and definitely no more additions of hand written extern "C" { blocks.

Good to tackle smaller bits, but I wanted to see the next bits : Do you think the suricata-sys crate should contain AppLayerTxData ? which is now defined in rust (and used by app-layer plugin)

It makes sense, as its a C data structure, not a Rust one. But that means maybe it should be defined in C, and bindgen'd over to Rust.

I don't think we can put it in suricata-sys. cbindgen likes to work on a crate if possible, not just files so it can determine the usage of fields to make its decisions about what it generates. It doesn't make sense to run cbinden on the output of bindgen.

@jasonish
Copy link
Member Author

Its ready as is IMO.
But then it would probably make sense to try to replace any of our hand written extern "C" { blocks of C functions with bindgen generated ones. I woudln't attempt to do this all at once as it could be a very big job with lots of conflicts while try to keep up to date. Best tackled in smaller bits, and definitely no more additions of hand written extern "C" { blocks.

Good to tackle smaller bits, but I wanted to see the next bits : Do you think the suricata-sys crate should contain AppLayerTxData ? which is now defined in rust (and used by app-layer plugin)

It makes sense, as its a C data structure, not a Rust one. But that means maybe it should be defined in C, and bindgen'd over to Rust.

I don't think we can put it in suricata-sys. cbindgen likes to work on a crate if possible, not just files so it can determine the usage of fields to make its decisions about what it generates. It doesn't make sense to run cbinden on the output of bindgen.

Of course it could be done, and conceptually makes sense. Just might need lots of exclude/includes in cbindgen.toml as I don't see a way to just exclude a whole file sys.rs for instance.

@jasonish
Copy link
Member Author

jasonish commented Jan 24, 2025

Its ready as is IMO.
But then it would probably make sense to try to replace any of our hand written extern "C" { blocks of C functions with bindgen generated ones. I woudln't attempt to do this all at once as it could be a very big job with lots of conflicts while try to keep up to date. Best tackled in smaller bits, and definitely no more additions of hand written extern "C" { blocks.

Good to tackle smaller bits, but I wanted to see the next bits : Do you think the suricata-sys crate should contain AppLayerTxData ? which is now defined in rust (and used by app-layer plugin)

It makes sense, as its a C data structure, not a Rust one. But that means maybe it should be defined in C, and bindgen'd over to Rust.
I don't think we can put it in suricata-sys. cbindgen likes to work on a crate if possible, not just files so it can determine the usage of fields to make its decisions about what it generates. It doesn't make sense to run cbinden on the output of bindgen.

Of course it could be done, and conceptually makes sense. Just might need lots of exclude/includes in cbindgen.toml as I don't see a way to just exclude a whole file sys.rs for instance.

Ok, it might not make sense:

From https://doc.rust-lang.org/cargo/reference/build-scripts.html#-sys-packages

| The library crate should provide declarations for types and functions in libfoo, but not higher-level abstractions.

So that means by convention, a nice wrapper around the C API belongs somewhere else:

| It is common to have a companion package without the -sys suffix that provides a safe, high-level abstractions on top of the sys package. For example, the git2 crate provides a high-level interface to the libgit2-sys crate.

These typically apply to a pure C library being exposed, not a hybrid library like ours.

This is mostly convention. It might make sense for us to mix safe wrappers in the same crate.

@catenacyber
Copy link
Contributor

So, I guess we should have a good view on how we want to divide in crates considering the use cases, including the plugin one ;-)

@jasonish
Copy link
Member Author

So, I guess we should have a good view on how we want to divide in crates considering the use cases, including the plugin one ;-)

Yeah, I think the sys can continue before that is decided. I didn't do this work for the plugin use case, but to avoid us writing our own bindings to C functions - it would be a big win just to get rid of those I think.

@suricata-qa
Copy link

WARNING:

field baseline test %
SURI_TLPR1_stats_chk
.uptime 629 651 103.5%

Pipeline 24354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants