You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NUMA is really important for performance. There are two things to consider: thread-pinning and memory-pinning. Thread pinning is trivial and can be done with the usual affinity mask. The best way to pin memory is by linking against libnuma.
A dependency, eeww. But a simple dependency (just a wrapper for a few syscalls) that I'd see on a level with libpthread; a necessary evil.
Let's look at a forwarding application on a NUMA system with NICs connected to both CPUs.
It will typically have at least one thread per NIC that handles incoming packets and forwards them somewhere. It might need to cross a NUMA-boundary to do so.
In our experience, it's most efficient to pin both the thread and packet memory to the CPU node to which NIC receiving packets is connected. Sending from the wrong node is not as bad as receiving to the wrong node. Also, we (usually) can't know where to send the packets when receiving them, so we can't pin the memory correctly for that.
How to implement this?
read numa_node in NIC's sysfs directory to figure out where it's connected to
use libnuma to set a memory policy before allocating memory for it
pin the thread correctly
Sounds easy, right?
But is it worth implementing it? What do we gain beside added complexity?
Sure, this is obviously a must-have feature for a real-world high-performance driver.
But we've decided against implementing it for now.
Almost everyone will just look at the code and that NUMA stuff is not particularly interesting compared to the rest and it just adds noise.
That doesn't mean you can't use ixy on a NUMA system.
We obviously want to run some benchmarks and performance tests with different NUMA scenarios and we are just going to use the numactl command for that:
That works just fine with the current memory allocator and allows us to benchmark all relevant scenarios on a NUMA system with NICs attached to both nodes.
The text was updated successfully, but these errors were encountered:
NUMA is really important for performance. There are two things to consider: thread-pinning and memory-pinning. Thread pinning is trivial and can be done with the usual affinity mask. The best way to pin memory is by linking against
libnuma
.A dependency, eeww. But a simple dependency (just a wrapper for a few syscalls) that I'd see on a level with
libpthread
; a necessary evil.Let's look at a forwarding application on a NUMA system with NICs connected to both CPUs.
It will typically have at least one thread per NIC that handles incoming packets and forwards them somewhere. It might need to cross a NUMA-boundary to do so.
In our experience, it's most efficient to pin both the thread and packet memory to the CPU node to which NIC receiving packets is connected. Sending from the wrong node is not as bad as receiving to the wrong node. Also, we (usually) can't know where to send the packets when receiving them, so we can't pin the memory correctly for that.
How to implement this?
numa_node
in NIC's sysfs directory to figure out where it's connected toSounds easy, right?
But is it worth implementing it? What do we gain beside added complexity?
Sure, this is obviously a must-have feature for a real-world high-performance driver.
But we've decided against implementing it for now.
Almost everyone will just look at the code and that NUMA stuff is not particularly interesting compared to the rest and it just adds noise.
That doesn't mean you can't use ixy on a NUMA system.
We obviously want to run some benchmarks and performance tests with different NUMA scenarios and we are just going to use the
numactl
command for that:That works just fine with the current memory allocator and allows us to benchmark all relevant scenarios on a NUMA system with NICs attached to both nodes.
The text was updated successfully, but these errors were encountered: