Add sub-module for reading `tc` stats #8210

mmynk · 2023-10-31T20:07:43Z

Adds the sub-crate for reading tc stats
Uses netlink-packet-route library which reads qdiscs via rtnetlink

Result:

$ below dump tc -b '5s ago'
Datetime            Interface   Kind       Queue Length   Bps        Pps        Bytes      Packets    Backlog    Drops      Requeues   Overlimits   MaxPacket   EcnMark    NewFlowsLen   OldFlowsLen   CeMark     DropOverlimit   NewFlowCount   MemoryUsage   DropOvermemory   Target     Limit      Interval   Ecn        Quantum    CeThreshold   DropBatchSize   MemoryLimit   Flows      Timestamp
2024-01-30 20:24:31 lo          noqueue    0              0.0 B/s    0/s        0.0 B/s    0/s        0/s        0/s        0/s        0/s          ?           ?          ?             ?             ?          ?               ?              ?             ?                ?          ?          ?          ?          ?          ?             ?               ?             ?          1706646271
2024-01-30 20:24:31 ens5        mq         0              0.0 B/s    0/s        167 B/s    2/s        0/s        0/s        0/s        0/s          ?           ?          ?             ?             ?          ?               ?              ?             ?                ?          ?          ?          ?          ?          ?             ?               ?             ?          1706646271
2024-01-30 20:24:31 ens5        fq_codel   0              0.0 B/s    0/s        0.0 B/s    0/s        0/s        0/s        0/s        0/s          110         0          0             0             0          0/s             0/s            0/s           0/s              4999       10240      99999      1          1514       0             64              33554432      0/s        1706646271
2024-01-30 20:24:31 ens5        fq_codel   0              0.0 B/s    0/s        167 B/s    2/s        0/s        0/s        0/s        0/s          182         0          0             0             0          0/s             0/s            0/s           0/s              4999       10240      99999      1          1514       0             64              33554432      0/s        1706646271

danobi

only reviewed first commit so far

below/tc/src/lib.rs

danobi · 2024-02-15T18:38:27Z

below/tc/src/lib.rs

+    'out: while let Ok(size) = socket.recv(&mut &mut recv_buf[offset..], 0) {
+        loop {
+            let bytes = &recv_buf[offset..];


two things:

why double borrow?

why keep track of offset? seems like you could always write-to and read-from the beginning of the buffer

this is mostly inspired from this example in netlink-packet-route library
from my understanding:

recv() accepts a mutable reference to BufMut, I believe that's why we need double borrow. it refuses to compile otherwise.

pub fn recv<B>(&self, buf: &mut B, flags: libc::c_int) -> Result<usize> where B: bytes::BufMut, { ... }

i can change it to: &mut recv_buf[offset..].borrow_mut() instead of &mut &mut recv_buf[offset..]

we write to the beginning of offset but don't read from beginning:

below/below/tc/src/lib.rs

Line 73 in bb0c013

let bytes = &recv_buf[offset..];

danobi · 2024-02-15T18:39:12Z

below/tc/src/lib.rs

+                return Err(TcError::Netlink(err.to_string()));
+            }
+            if let NetlinkPayload::Done(_) = payload {
+                break 'out;


am i misunderstanding or could this be a continue?

the outer loop is for each packet received from the kernel
the inner loop is for parsing those packets into netlink object(s)
Done implies this is the last packet for the object we requested, so we're done parsing [1]

[1] from the library example

below/tc/src/lib.rs

below/tc/src/test.rs

below/tc/src/types.rs

danobi · 2024-02-15T18:50:16Z

below/tc/src/types.rs

We might have discussed this before, but why create wrappers for netlink_packet_route structs? Doesn't seem like there's much value add. And later when new fields are supported you have to wire it all up

Thinking again, I think current approach is good. If we are going to save these types to disk, then it's better we control the layout.

brianc118

Thanks for this. Just gave it a first pass with some comments.

below/model/src/tc_model.rs

below/tc/src/types.rs

brianc118 · 2024-03-07T15:50:42Z

below/tc/src/lib.rs

+/// The kernel responds with a message of type `RTM_NEWQDISC` for each qdisc.
+fn get_netlink_qdiscs() -> Result<Vec<TcMessage>> {
+    // open a socket
+    let socket = Socket::new(NETLINK_ROUTE).map_err(|e| TcError::Netlink(e.to_string()))?;


Are we going to support IPv6 (NETLINK_ROUTE6)? The most important thing is to make sure the sample type in types.rs is extensible and will make sense in the future if more stats are added. Changes to it are difficult as we need to compare about forward/backward compatibility.

From my understanding, the sample type in types.rs will not change regardless of whether we use IPv6 or IPv4. The reading part might change, ie, get_netlink_qdiscs() but not the types that the underlying response is translated into (since we are basically interested in the qdisc fields which should be independent of the reading).

brianc118 · 2024-03-07T16:00:08Z

below/model/src/collector.rs

+        tc: if !options.enable_tc_stats {
+            None
+        } else {
+            match tc::tc_stats() {


How expensive is this since we're talking over netlink sockets and it's blocking? We generally try to make sure the main collection is fast to minimize collection skew and make sure we're still able to collect data under high system load.

More expensive collection can be made "best effort" and delegated to some other thread (e.g. via AsyncCollectorPlugin).

I do not think netlink operations would be expensive since they directly communicate with the kernel. However, I believe the receive operation is blocking. It could be slow only if there are, say 1000+ interfaces, which I am not sure is likely.

That being said, I tried to duplicate the implementation for gpu_stats for tc in this commit. It seems to work but would appreciate a closer look.

That commit looks good. Note that the async collection means there may be more skew. e.g. TC collected at t=0s but sample collected in main thread at t=4s.

below/model/src/collector_plugin.rs

brianc118 · 2024-03-07T16:44:44Z

below/model/src/tc_model.rs

+)]
+pub struct TcModel {
+    #[queriable(subquery)]
+    pub tc: Vec<SingleTcModel>,


TcModel is queriable on index (e.g. queriable field id might be like "tc.tc.<idx>.bps"). Is there some field in SingleTcModel that uniquely identifies the SingleTcModel (e.g. interface?). If so we should pull it out and make this a BTreeMap<String, SingleTcModel>, so that it's easier to query.

It is not straightforward because there's no clear identifier for qdiscs, also an interface could have multiple qdiscs.
From this website:

qdiscs are always referred to using a combination of a major node number and a minor node number. For any given qdisc, the major node number has to be unique for the root hook for a given Ethernet interface.

I believe we could use a combination of interface, handle and parent to uniquely identify a qdisc. We could add a key field and assign to it the hashed or concatenated combination of the 3 fields.

It seems to me like it should be keyed uniquely identified by major/minor which are actually included in TcMessage. I happen to already have a bit of code handy (we use it for identifying GPUs in the internal version).

You just need to change this to BTreeMap<MajMin, SingleTcModel>.

use serde_with::DeserializeFromStr; use serde_with::SerializeDisplay; // A Queriable MajMin #[derive( Default, Clone, PartialEq, PartialOrd, Ord, Eq, Debug, DeserializeFromStr, SerializeDisplay, below_derive::Queriable )] pub struct MajMin { pub minor_id: u64, pub major_id: u64, } impl std::fmt::Display for MajMin { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { write!(f, "{}:{}", self.major_id, self.minor_id) } } impl std::str::FromStr for MajMin { type Err = anyhow::Error; fn from_str(s: &str) -> Result<Self, Self::Err> { if let Some(colon_idx) = s.find(':') { Ok(Self { major_id: (&s[..colon_idx]).parse::<u64>()?, minor_id: (&s[colon_idx + 1..]).parse::<u64>()?, }) } else { Err(anyhow!("No colon in 'maj:min'")) } } }

Tc stats is made best effort as netlink operations could be blocking.

mmynk · 2024-04-01T19:25:11Z

@brianc118 Could you take a look at the PR again? Thanks!

brianc118

Looks good. Mostly want to make sure we agree on the right unique identifier for Tc/qdiscs and make sure that's recorded in samples.

below/tc/src/types.rs

brianc118 · 2024-04-03T14:59:40Z

below/tc/src/types.rs

+                memory_usage: qdisc.memory_usage,
+                drop_overmemory: qdisc.drop_overmemory,
+            },
+            _ => Self::default(),


In this case, would it be better to just return None and make new return Option<FqCodelXStats>?

Or better, directly take TcFqCodelQdStats and leave it to caller to figure out what to do if xstats is not Qdisc.

question for the second, makes sense to only deal with TcFqCodelQdStats but say in the future we want to expose class stats TcFqCodelClStats as well, then we'll have to change it back to something similar as above.
I wanted for FqCodelXStats to represent a combined struct of Qdisc and Class, if it's a Qdisc the fields relevant to Class would be None and vice-versa.

Or would it better to have another struct identify a classful FqCodel xstats if and when we need to? Then it would make sense to implement your second suggestion.

Or would it better to have another struct identify a classful FqCodel xstats if and when we need to?

I don't have a strong preference for either - I'm not familiar with this stuff enough to have a strong opinion :), but my feeling is it might be better to split it so each struct has clear meaning? I'll leave the decision up to you.

Right now I don't think it's taking either approach though. We'd want to make the fields Option<> if we are to futureproof it for overloading with TcFqCodelClStats.

have addressed this in commit 97ac6bf
create an enum wrapper which can be extended with TcFqCodelClStats when we decide to add classful qdiscs support

brianc118 · 2024-04-03T15:17:44Z

below/model/src/tc_model.rs

+)]
+pub struct TcModel {
+    #[queriable(subquery)]
+    pub tc: Vec<SingleTcModel>,


It seems to me like it should be keyed uniquely identified by major/minor which are actually included in TcMessage. I happen to already have a bit of code handy (we use it for identifying GPUs in the internal version).

You just need to change this to BTreeMap<MajMin, SingleTcModel>.

use serde_with::DeserializeFromStr; use serde_with::SerializeDisplay; // A Queriable MajMin #[derive( Default, Clone, PartialEq, PartialOrd, Ord, Eq, Debug, DeserializeFromStr, SerializeDisplay, below_derive::Queriable )] pub struct MajMin { pub minor_id: u64, pub major_id: u64, } impl std::fmt::Display for MajMin { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { write!(f, "{}:{}", self.major_id, self.minor_id) } } impl std::str::FromStr for MajMin { type Err = anyhow::Error; fn from_str(s: &str) -> Result<Self, Self::Err> { if let Some(colon_idx) = s.find(':') { Ok(Self { major_id: (&s[..colon_idx]).parse::<u64>()?, minor_id: (&s[colon_idx + 1..]).parse::<u64>()?, }) } else { Err(anyhow!("No colon in 'maj:min'")) } } }

brianc118 · 2024-04-03T15:27:41Z

below/tc/src/types.rs

+
+/// `Tc` represents a traffic control qdisc.
+#[derive(Default, Clone, PartialEq, Debug, Serialize, Deserialize)]
+pub struct TcStat {


I think you should include a new MajMin struct to uniquely identify the qdisc. I believe the TcHandle (maj, min) is all part of the TcHeader of TcMessage. https://docs.rs/netlink-packet-route/latest/netlink_packet_route/tc/struct.TcMessage.html

We already have handle but from some testing it looks like it does not uniquely identify the qdisc.
From my understanding, a handle would uniquely identify the qdisc for a given interface and a parent. So it's the combination of all these 3 fields (interface, parent, handle), eg:

TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 1, if_name: "lo", kind: "noqueue", ... } TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 2, if_name: "ens5", kind: "mq", ... } TcStat { handle: 0, parent: 2, major: 0, minor: 0, if_index: 2, if_name: "ens5", kind: "fq_codel", ... } TcStat { handle: 0, parent: 1, major: 0, minor: 0, if_index: 2, if_name: "ens5", kind: "fq_codel", ... } TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 3, if_name: "docker0", ... } TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 4, if_name: "br-bddb616e0d48", ... } TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 76, if_name: "vethdc90bd5", ... } TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 108, if_name: "veth2dc7d25", ... }

Yep agreed, we'd need the combination of the 3. I'm happy with it as is, but in the long term, I think we should be keying on the unique identifier.

brianc118 · 2024-04-03T15:32:19Z

below/tc/Cargo.toml

@@ -0,0 +1,13 @@
+[package]


Can you also add a README to the crate?

brianc118 · 2024-04-03T15:37:53Z

below/model/src/collector.rs

+        tc: if !options.enable_tc_stats {
+            None
+        } else {
+            match tc::tc_stats() {


That commit looks good. Note that the async collection means there may be more skew. e.g. TC collected at t=0s but sample collected in main thread at t=4s.

mmynk · 2024-04-10T17:05:47Z

@brianc118 not sure if you got a chance to read my last comment regarding the implementation of maj:min identifier for qdiscs: #8210 (comment)
I think that's the only remaining part for this PR. I'll add README in the meanwhile.

The struct `FqCodelXStats` is changed into enum. The enum has type `FqCodelQdiscStats` which represents `tc_fq_codel_qd_stats`. We can add type to represent `tc_fq_codel_cl_stats` when classful qdisc support is added.

mmynk · 2024-04-17T02:47:06Z

@brianc118 added README and refactored the fq_codel xstats definition. Could you please take another look?

I will refactor TcModel to be uniquely identifiable in a subsequent PR. Thanks!

brianc118

Looks good, thanks for this

below/model/src/tc_model.rs

facebook-github-bot · 2024-04-17T19:51:13Z

@brianc118 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mmynk · 2024-04-18T22:20:59Z

@brianc118 fixed the lint issues, for some reason the test seems to be running for over a day, would re-triggering help? regardless, could you please re-import to trigger the linter again? thanks!

facebook-github-bot · 2024-04-19T13:00:46Z

@brianc118 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

brianc118 · 2024-04-19T13:03:07Z

@mmynk don't worry about the internal tests. I suspect the test hook isn't working as expected.

mmynk · 2024-04-22T18:16:01Z

@brianc118 is the PR okay to be merged? just wanted to make sure if there is something I need to fix (lint or test). thanks!

brianc118 · 2024-04-22T19:35:18Z

@mmynk I was just working on vendoring the netlink-* crates internally which proved to be a pain. Expect to have this merged in the next few days.

mmynk · 2024-04-22T19:54:25Z

@mmynk I was just working on vendoring the netlink-* crates internally which proved to be a pain.

Oh shoot! I wish I could help with that.

Expect to have this merged in the next few days.

Great, thanks for letting me know.

facebook-github-bot · 2024-04-25T13:32:42Z

@brianc118 merged this pull request in 4acd748.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 31, 2023

mmynk force-pushed the feat/tc branch from b7e8074 to 93f1129 Compare November 2, 2023 13:40

mmynk force-pushed the feat/tc branch 10 times, most recently from 54cabb0 to bb0c013 Compare January 31, 2024 19:04

mmynk marked this pull request as ready for review January 31, 2024 19:25

danobi reviewed Feb 15, 2024

View reviewed changes

mmynk force-pushed the feat/tc branch 2 times, most recently from 21a3519 to dff7a60 Compare February 19, 2024 09:36

brianc118 suggested changes Mar 7, 2024

View reviewed changes

mmynk force-pushed the feat/tc branch from 39229a7 to 5ec54b1 Compare March 17, 2024 00:47

mmynk added 11 commits April 1, 2024 16:42

Add sub-module for reading tc stats

2b00bef

Add model to represent Tc object

5f826b2

Add dump command for tc

2282fcd

Add render configs for tc

8ac8604

Add config knobs

28bc14f

Sync dependencies

ba6476d

rustfmt on changed files

91221b3

Use borrow trait

85b1242

Add documentation for public fields

0f448d2

Explicit imports over star

54ae812

Revert rustfmt change

564b43d

mmynk force-pushed the feat/tc branch from 5ec54b1 to 83f218f Compare April 1, 2024 17:20

mmynk added 4 commits April 1, 2024 17:45

Rename Tc type to TcStat

158d66a

Pass only the relevant interface to TcStat::new

5b77fd9

Add an async collector plugin for tc stats

6985d1d

Tc stats is made best effort as netlink operations could be blocking.

Fix length of common field vec

8fe69d0

mmynk force-pushed the feat/tc branch from 83f218f to 8fe69d0 Compare April 1, 2024 19:24

brianc118 suggested changes Apr 3, 2024

View reviewed changes

Add README for tc crate

c4eeb0e

mmynk force-pushed the feat/tc branch from 4a2b496 to 81f8aa5 Compare April 17, 2024 00:17

Refactor fq_codel xstats

97ac6bf

The struct `FqCodelXStats` is changed into enum. The enum has type `FqCodelQdiscStats` which represents `tc_fq_codel_qd_stats`. We can add type to represent `tc_fq_codel_cl_stats` when classful qdisc support is added.

mmynk force-pushed the feat/tc branch from 81f8aa5 to 97ac6bf Compare April 17, 2024 02:36

brianc118 approved these changes Apr 17, 2024

View reviewed changes

below/model/src/tc_model.rs Outdated Show resolved Hide resolved

below/model/src/tc_model.rs Outdated Show resolved Hide resolved

mmynk added 3 commits April 17, 2024 14:40

Merge branch 'main' into feat/tc

9774885

Use wrapper below_derive::queriable_derives

d11374f

Update deps from merge

76fef69

Fix lint issues

3d987a7

facebook-github-bot closed this in 4acd748 Apr 25, 2024

facebook-github-bot added the Merged label Apr 25, 2024

mmynk deleted the feat/tc branch April 25, 2024 17:23

Add sub-module for reading tc stats #8210

Add sub-module for reading tc stats #8210

Conversation

mmynk commented Oct 31, 2023 • edited Loading

danobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmynk Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmynk Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brianc118 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmynk Mar 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmynk Apr 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brianc118 Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmynk commented Apr 1, 2024

brianc118 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmynk commented Apr 10, 2024

mmynk commented Apr 17, 2024

brianc118 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 17, 2024

mmynk commented Apr 18, 2024

facebook-github-bot commented Apr 19, 2024

brianc118 commented Apr 19, 2024

mmynk commented Apr 22, 2024

brianc118 commented Apr 22, 2024

mmynk commented Apr 22, 2024

facebook-github-bot commented Apr 25, 2024

Add sub-module for reading `tc` stats #8210

Add sub-module for reading `tc` stats #8210

mmynk commented Oct 31, 2023 •

edited

Loading

mmynk Feb 19, 2024 •

edited

Loading

mmynk Feb 19, 2024 •

edited

Loading

mmynk Mar 16, 2024 •

edited

Loading

mmynk Apr 1, 2024 •

edited

Loading

brianc118 Mar 7, 2024 •

edited

Loading