-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sub-module for reading tc
stats
#8210
Conversation
54cabb0
to
bb0c013
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only reviewed first commit so far
below/tc/src/lib.rs
Outdated
'out: while let Ok(size) = socket.recv(&mut &mut recv_buf[offset..], 0) { | ||
loop { | ||
let bytes = &recv_buf[offset..]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two things:
- why double borrow?
- why keep track of offset? seems like you could always write-to and read-from the beginning of the buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is mostly inspired from this example in netlink-packet-route
library
from my understanding:
recv()
accepts a mutable reference toBufMut
, I believe that's why we need double borrow. it refuses to compile otherwise.
pub fn recv<B>(&self, buf: &mut B, flags: libc::c_int) -> Result<usize>
where
B: bytes::BufMut,
{
...
}
i can change it to: &mut recv_buf[offset..].borrow_mut()
instead of &mut &mut recv_buf[offset..]
- we write to the beginning of
offset
but don't read from beginning:
Line 73 in bb0c013
let bytes = &recv_buf[offset..];
return Err(TcError::Netlink(err.to_string())); | ||
} | ||
if let NetlinkPayload::Done(_) = payload { | ||
break 'out; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i misunderstanding or could this be a continue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the outer loop is for each packet received from the kernel
the inner loop is for parsing those packets into netlink object(s)
Done
implies this is the last packet for the object we requested, so we're done parsing [1]
[1] from the library example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have discussed this before, but why create wrappers for netlink_packet_route
structs? Doesn't seem like there's much value add. And later when new fields are supported you have to wire it all up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again, I think current approach is good. If we are going to save these types to disk, then it's better we control the layout.
21a3519
to
dff7a60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. Just gave it a first pass with some comments.
/// The kernel responds with a message of type `RTM_NEWQDISC` for each qdisc. | ||
fn get_netlink_qdiscs() -> Result<Vec<TcMessage>> { | ||
// open a socket | ||
let socket = Socket::new(NETLINK_ROUTE).map_err(|e| TcError::Netlink(e.to_string()))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to support IPv6 (NETLINK_ROUTE6)? The most important thing is to make sure the sample type in types.rs
is extensible and will make sense in the future if more stats are added. Changes to it are difficult as we need to compare about forward/backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, the sample type in types.rs
will not change regardless of whether we use IPv6 or IPv4. The reading part might change, ie, get_netlink_qdiscs()
but not the types that the underlying response is translated into (since we are basically interested in the qdisc fields which should be independent of the reading).
below/model/src/collector.rs
Outdated
tc: if !options.enable_tc_stats { | ||
None | ||
} else { | ||
match tc::tc_stats() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How expensive is this since we're talking over netlink sockets and it's blocking? We generally try to make sure the main collection is fast to minimize collection skew and make sure we're still able to collect data under high system load.
More expensive collection can be made "best effort" and delegated to some other thread (e.g. via AsyncCollectorPlugin
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think netlink operations would be expensive since they directly communicate with the kernel. However, I believe the receive operation is blocking. It could be slow only if there are, say 1000+ interfaces, which I am not sure is likely.
That being said, I tried to duplicate the implementation for gpu_stats
for tc
in this commit. It seems to work but would appreciate a closer look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That commit looks good. Note that the async collection means there may be more skew. e.g. TC collected at t=0s but sample collected in main thread at t=4s.
)] | ||
pub struct TcModel { | ||
#[queriable(subquery)] | ||
pub tc: Vec<SingleTcModel>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TcModel
is queriable on index (e.g. queriable field id might be like "tc.tc.<idx>.bps"
). Is there some field in SingleTcModel
that uniquely identifies the SingleTcModel
(e.g. interface?
). If so we should pull it out and make this a BTreeMap<String, SingleTcModel>
, so that it's easier to query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not straightforward because there's no clear identifier for qdiscs, also an interface could have multiple qdiscs.
From this website:
qdiscs are always referred to using a combination of a major node number and a minor node number. For any given qdisc, the major node number has to be unique for the root hook for a given Ethernet interface.
I believe we could use a combination of interface, handle and parent to uniquely identify a qdisc. We could add a key
field and assign to it the hashed or concatenated combination of the 3 fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like it should be keyed uniquely identified by major/minor which are actually included in TcMessage
. I happen to already have a bit of code handy (we use it for identifying GPUs in the internal version).
You just need to change this to BTreeMap<MajMin, SingleTcModel>
.
use serde_with::DeserializeFromStr;
use serde_with::SerializeDisplay;
// A Queriable MajMin
#[derive(
Default,
Clone,
PartialEq,
PartialOrd,
Ord,
Eq,
Debug,
DeserializeFromStr,
SerializeDisplay,
below_derive::Queriable
)]
pub struct MajMin {
pub minor_id: u64,
pub major_id: u64,
}
impl std::fmt::Display for MajMin {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}:{}", self.major_id, self.minor_id)
}
}
impl std::str::FromStr for MajMin {
type Err = anyhow::Error;
fn from_str(s: &str) -> Result<Self, Self::Err> {
if let Some(colon_idx) = s.find(':') {
Ok(Self {
major_id: (&s[..colon_idx]).parse::<u64>()?,
minor_id: (&s[colon_idx + 1..]).parse::<u64>()?,
})
} else {
Err(anyhow!("No colon in 'maj:min'"))
}
}
}
Tc stats is made best effort as netlink operations could be blocking.
@brianc118 Could you take a look at the PR again? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Mostly want to make sure we agree on the right unique identifier for Tc/qdiscs and make sure that's recorded in samples.
below/tc/src/types.rs
Outdated
memory_usage: qdisc.memory_usage, | ||
drop_overmemory: qdisc.drop_overmemory, | ||
}, | ||
_ => Self::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, would it be better to just return None
and make new
return Option<FqCodelXStats>
?
Or better, directly take TcFqCodelQdStats
and leave it to caller to figure out what to do if xstats
is not Qdisc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question for the second, makes sense to only deal with TcFqCodelQdStats
but say in the future we want to expose class stats TcFqCodelClStats
as well, then we'll have to change it back to something similar as above.
I wanted for FqCodelXStats
to represent a combined struct of Qdisc
and Class
, if it's a Qdisc
the fields relevant to Class
would be None
and vice-versa.
Or would it better to have another struct identify a classful FqCodel xstats if and when we need to? Then it would make sense to implement your second suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or would it better to have another struct identify a classful FqCodel xstats if and when we need to?
I don't have a strong preference for either - I'm not familiar with this stuff enough to have a strong opinion :), but my feeling is it might be better to split it so each struct has clear meaning? I'll leave the decision up to you.
Right now I don't think it's taking either approach though. We'd want to make the fields Option<> if we are to futureproof it for overloading with TcFqCodelClStats
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have addressed this in commit 97ac6bf
create an enum wrapper which can be extended with TcFqCodelClStats
when we decide to add classful qdiscs support
)] | ||
pub struct TcModel { | ||
#[queriable(subquery)] | ||
pub tc: Vec<SingleTcModel>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like it should be keyed uniquely identified by major/minor which are actually included in TcMessage
. I happen to already have a bit of code handy (we use it for identifying GPUs in the internal version).
You just need to change this to BTreeMap<MajMin, SingleTcModel>
.
use serde_with::DeserializeFromStr;
use serde_with::SerializeDisplay;
// A Queriable MajMin
#[derive(
Default,
Clone,
PartialEq,
PartialOrd,
Ord,
Eq,
Debug,
DeserializeFromStr,
SerializeDisplay,
below_derive::Queriable
)]
pub struct MajMin {
pub minor_id: u64,
pub major_id: u64,
}
impl std::fmt::Display for MajMin {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}:{}", self.major_id, self.minor_id)
}
}
impl std::str::FromStr for MajMin {
type Err = anyhow::Error;
fn from_str(s: &str) -> Result<Self, Self::Err> {
if let Some(colon_idx) = s.find(':') {
Ok(Self {
major_id: (&s[..colon_idx]).parse::<u64>()?,
minor_id: (&s[colon_idx + 1..]).parse::<u64>()?,
})
} else {
Err(anyhow!("No colon in 'maj:min'"))
}
}
}
|
||
/// `Tc` represents a traffic control qdisc. | ||
#[derive(Default, Clone, PartialEq, Debug, Serialize, Deserialize)] | ||
pub struct TcStat { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should include a new MajMin
struct to uniquely identify the qdisc. I believe the TcHandle
(maj, min) is all part of the TcHeader
of TcMessage
. https://docs.rs/netlink-packet-route/latest/netlink_packet_route/tc/struct.TcMessage.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have handle but from some testing it looks like it does not uniquely identify the qdisc.
From my understanding, a handle would uniquely identify the qdisc for a given interface and a parent. So it's the combination of all these 3 fields (interface, parent, handle), eg:
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 1, if_name: "lo", kind: "noqueue", ... }
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 2, if_name: "ens5", kind: "mq", ... }
TcStat { handle: 0, parent: 2, major: 0, minor: 0, if_index: 2, if_name: "ens5", kind: "fq_codel", ... }
TcStat { handle: 0, parent: 1, major: 0, minor: 0, if_index: 2, if_name: "ens5", kind: "fq_codel", ... }
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 3, if_name: "docker0", ... }
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 4, if_name: "br-bddb616e0d48", ... }
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 76, if_name: "vethdc90bd5", ... }
TcStat { handle: 0, parent: 4294967295, major: 0, minor: 0, if_index: 108, if_name: "veth2dc7d25", ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep agreed, we'd need the combination of the 3. I'm happy with it as is, but in the long term, I think we should be keying on the unique identifier.
@@ -0,0 +1,13 @@ | |||
[package] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a README to the crate?
below/model/src/collector.rs
Outdated
tc: if !options.enable_tc_stats { | ||
None | ||
} else { | ||
match tc::tc_stats() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That commit looks good. Note that the async collection means there may be more skew. e.g. TC collected at t=0s but sample collected in main thread at t=4s.
@brianc118 not sure if you got a chance to read my last comment regarding the implementation of |
The struct `FqCodelXStats` is changed into enum. The enum has type `FqCodelQdiscStats` which represents `tc_fq_codel_qd_stats`. We can add type to represent `tc_fq_codel_cl_stats` when classful qdisc support is added.
@brianc118 added README and refactored the I will refactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for this
@brianc118 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@brianc118 fixed the lint issues, for some reason the test seems to be running for over a day, would re-triggering help? regardless, could you please re-import to trigger the linter again? thanks! |
@brianc118 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mmynk don't worry about the internal tests. I suspect the test hook isn't working as expected. |
@brianc118 is the PR okay to be merged? just wanted to make sure if there is something I need to fix (lint or test). thanks! |
@mmynk I was just working on vendoring the netlink-* crates internally which proved to be a pain. Expect to have this merged in the next few days. |
Oh shoot! I wish I could help with that.
Great, thanks for letting me know. |
@brianc118 merged this pull request in 4acd748. |
tc
statsnetlink-packet-route
library which readsqdisc
s via rtnetlinkResult: