Fix for some sync issues b/w qemu and dpdk-vdpa #1

asaini-xilinx · 2022-06-15T05:34:36Z

vdpa/sfc: resolve race between libvhost and sfc_vdpa_pmd

For vhost-user backend, qemu sets callfd twice for each VQ.
Once when it opens vdpa-socket and thenn again when it starts
VQ(s)

To start VQ(s) qemu issues the following messages for all VQ(s):
VHOST_USER_SET_VRING_NUM
VHOST_USER_SET_VRING_BASE
VHOST_USER_SET_VRING_ADDR
VHOST_USER_SET_VRING_KICK
VHOST_USER_SET_VRING_CALL (... this overwrites the old callfd)

After processing each message dpdk-libvhost checks vq_is_ready(),
and if vq_is_ready() returns true it calls vdpa_dev->ops->dev_conf()

static bool vq_is_ready()
{
...
    ring_ok = vq->desc && vq->avail && vq->used;
...
    return ring_ok &&
            vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
            vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD &&
            vq->enabled;
}

And after processing VHOST_USER_SET_VRING_KICK message for the
last VQ vq_is_ready() return true b/c vq->callfd contains the old
value. So dev_conf is called before VHOST_USER_SET_VRING_CALL can be
processed for the last VQ, and VHOST_USER_SET_VRING_CALL message for
the last VQ is processed after dev_conf completes.

sfc_vdpa_pmd's dev_conf spawns a thread to set rte_vhost_host_notifier_ctrl()
before returning control to libvhost. This parallel thread in turn invokes
get_notify_area(). To get the notify_area, sfc_vdpa_pmd needs to
query the HW and for this query it needs an enabled VQ.

But at the same time libvhost is trying reset the callfd for the last VQ
to the new value and to do this it momentarily disables that VQ causing
a race b/w the processing VHOST_USER_SET_VRING_CALL and setting
rte_vhost_host_notifier_ctrl()

To resolve this race condition, query the HW and cache notify_area
inside dev_conf() instead of the parallel thread.

vdpa/sfc: handle sync issue between qemu and vhost-user

When DPDK app is running in the VF, it sometimes rings the doorbell
before sfc_vdpa_dev_config has had a chance to complete and hence it
misses the event. As workaround, ring the doorbell when vDPA reports
the notify_area to QEMU.

Signed-off-by: Vijay Kumar Srivastava <[email protected]>
Signed-off-by: Abhimanyu Saini <[email protected]>

Signed-off-by: Andrew Rybchenko <[email protected]>

Failsafe driver has been unable to configure some sub-devices which don't support disabling Rx drop after device info reporting had been reconsidered. Suggest enabling Rx drop by default once at least one sub-device is found to advertise this setting. Fixes: 4586be3 ("net/failsafe: fix reported device info") Cc: Stephen Hemminger <[email protected]> Cc: [email protected] Signed-off-by: Ivan Malov <[email protected]>

Xilinx virtio-net does not supports MSS smaller than 512. Signed-off-by: Andrew Rybchenko <[email protected]>

Xilinx virtio-net dies if too many segments are requested. 185 is an experimental number. Signed-off-by: Andrew Rybchenko <[email protected]>

Setting exact link speed makes sense if auto-negotiation is disabled. Fixed flag is required to disable auto-negotiation. Fixes: 88fbedc ("app/testpmd: move speed and duplex parsing in a function") Cc: [email protected] Signed-off-by: Andrew Rybchenko <[email protected]> Acked-by: Bernard Iremonger <[email protected]>

NIC is unconfigured when CMOD is used. Signed-off-by: Andrew Rybchenko <[email protected]>

Signed-off-by: Igor Romanov <[email protected]>

Some iterations of recent performance tests report assertion failure which does not give enough information. Log the RxQ state right before the assert.

Release 15 is used to identify next-sfc. Signed-off-by: Andrew Rybchenko <[email protected]>

When DPDK app is running in the VF, it sometimes rings the doorbell before sfc_vdpa_dev_config has had a chance to complete and hence it misses the event. As workaround, ring the doorbell when vDPA reports the notify_area to QEMU. Signed-off-by: Vijay Kumar Srivastava <[email protected]> Signed-off-by: Abhimanyu Saini <[email protected]>

Increase the number to defaut RX/TX queue pairs to 8, and add MQ feature flag to vDPA protocol features. Signed-off-by: Abhimanyu Saini <[email protected]>

Buffer for MCDI channel is allocated using rte_memzone_reserve_aligned with zone name 'mcdi'. Since multiple MCDI channels are needed to support multiple VF(s) and rte_memzone_reserve_aligned expects unique zone names, append pciaddr to zone name to make it unique. Signed-off-by: Abhimanyu Saini <[email protected]>

The used/avail queue indexes are not bound by queue size, because the descriptor entry index is calculated by a simple modulo between queue index and queue_size So, do not check initial used and avail queue indexes against queue size because it is possible for these indexes to be greater than queue size in the following cases: 1) The queue is created to be migrated into, or 2) The client issues a qstop/qstart after running datapath Signed-off-by: Abhimanyu Saini <[email protected]>

Regenerate MCDI headers from smartnic_registry:72940ad Signed-off-by: Abhimanyu Saini <[email protected]>

Change cidx and pidx definition to mean used queue and avail queue index respectively. Also, remove the cached copies of cidx/pidx in vq_cxt as they are never updated. Invoke rte_vhost_set_vring_base() instead. Signed-off-by: Abhimanyu Saini <[email protected]>

ol-andrewr · 2022-06-15T08:59:28Z

drivers/vdpa/sfc/sfc_vdpa_ops.c

@@ -856,6 +858,19 @@ sfc_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size)
 	sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64,
 		      *offset);

+	pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev;
+	doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr +
+		   bar_offset + reg.offset;


Shouldn't we use already calculated *offset here?

ol-andrewr

One more point. I think it is a fix which should be backported to stable branches. So, add "Fixes: " tag with reference to earlier changeset to be fixed and add:
Cc: [email protected]

ol-andrewr · 2022-06-15T09:08:49Z

drivers/vdpa/sfc/sfc_vdpa_ops.c

 		goto fail_virtio_qstart;
 	}

 	sfc_vdpa_info(ops_data->dev_handle,
 		      "virtqueue started successfully for vq_num %d", vq_num);

+	rc = efx_virtio_get_doorbell_offset(vq,	&doorbell);


It looks like there are two spaces before &doorbeel.

ol-andrewr

LGTM, but please, avoid sfc_vdpa_pmd in the description. Just name it "driver"

b/w .. -> between the libvhost and the driver.

In SW assisted live migration, vDPA driver will stop all virtqueues and setup up SW vrings to relay the communication between the virtio driver and the vDPA device using an event driven relay thread This will allow vDPA driver to help on guest dirty page logging for live migration. Signed-off-by: Abhimanyu Saini <[email protected]>

vDPA driver allocates a MCDI buffer and maps it for DMA in the IO address space of the VF, and guest virtio driver independently allocates virtqueues and uses their corresponding guest IOVA(s). There is no guarantee that the values for these two addresses will be unique and vDPA might need to relocate mcdi IOVA. The patch adds libefx API to handle mcdi IOVA remapping. Signed-off-by: Abhimanyu Saini <[email protected]>

MCDI IOVA unmap/remap is handled by client driver, add dma_remap callback to mcdi_ops so that libefx can pass the control to client driver and move the MCDI IOVA to a new address. Signed-off-by: Abhimanyu Saini <[email protected]>

vDPA driver allocates a MCDI buffer and maps it for DMA in the IO address space of the virtual function, and the virtio-net driver independently allocates virtqueues and uses their corresponding guest IOVA(s). There is no guarantee that the values for these two addresses will be unique and vDPA might need to relocate mcdi IOVA. To resolves the problem of overlap between IOVA(s) allocated by vDPA and qemu, three changes have been made to the vDPA driver. 1) Cache all known IOVA(s) in a linked list and add functions to check overlap and resolve IOVA opverlaps. 2) Check for overlap 3) If overlap is found, find a new IOVA and invoke efx_mcdi_dma_remap Signed-off-by: Abhimanyu Saini <[email protected]>

If DPDK is built with thread sanitizer it reports a race in setting of multiprocess file descriptor. The fix is to use atomic operations when updating mp_fd. Build: $ meson -Db_sanitize=address build $ ninja -C build Simple example: $ .build/app/dpdk-testpmd -l 1-3 --no-huge EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 1 EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem EAL: Detected static linkage of DPDK EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' testpmd: No probed ethernet devices testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc EAL: Error - exiting with code: 1 Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory ================== WARNING: ThreadSanitizer: data race (pid=87245) Write of size 4 at 0x558e04d8ff70 by main thread: #0 rte_mp_channel_cleanup <null> (dpdk-testpmd+0x1e7d30c) #1 rte_eal_cleanup <null> (dpdk-testpmd+0x1e85929) #2 rte_exit <null> (dpdk-testpmd+0x1e5bc0a) #3 mbuf_pool_create.cold <null> (dpdk-testpmd+0x274011) #4 main <null> (dpdk-testpmd+0x5cc15d) Previous read of size 4 at 0x558e04d8ff70 by thread T2: #0 mp_handle <null> (dpdk-testpmd+0x1e7c439) #1 ctrl_thread_init <null> (dpdk-testpmd+0x1e6ee1e) As if synchronized via sleep: #0 nanosleep libsanitizer/tsan/tsan_interceptors_posix.cpp:366 #1 get_tsc_freq <null> (dpdk-testpmd+0x1e92ff9) #2 set_tsc_freq <null> (dpdk-testpmd+0x1e6f2fc) #3 rte_eal_timer_init <null> (dpdk-testpmd+0x1e931a4) #4 rte_eal_init.cold <null> (dpdk-testpmd+0x29e578) #5 main <null> (dpdk-testpmd+0x5cbc45) Location is global 'mp_fd' of size 4 at 0x558e04d8ff70 (dpdk-testpmd+0x000003122f70) Thread T2 'rte_mp_handle' (tid=87248, running) created by main thread at: #0 pthread_create libsanitizer/tsan/tsan_interceptors_posix.cpp:969 #1 rte_ctrl_thread_create <null> (dpdk-testpmd+0x1e6efd0) #2 rte_mp_channel_init.cold <null> (dpdk-testpmd+0x29cb7c) #3 rte_eal_init <null> (dpdk-testpmd+0x1e8662e) #4 main <null> (dpdk-testpmd+0x5cbc45) SUMMARY: ThreadSanitizer: data race (app/dpdk-testpmd+0x1e7d30c) in rte_mp_channel_cleanup ================== ThreadSanitizer: reported 1 warnings Fixes: bacaa27 ("eal: add channel for multi-process communication") Cc: [email protected] Signed-off-by: Stephen Hemminger <[email protected]> Acked-by: Anatoly Burakov <[email protected]> Reviewed-by: Chengwen Feng <[email protected]>

ol-andrewr and others added 16 commits December 7, 2021 13:32

INTERNAL: add helper file for Xilinx reviewboard setup

45fb355

Signed-off-by: Andrew Rybchenko <[email protected]>

INTERNAL: net/sfc: enforce driver debug

0fd39ff

Signed-off-by: Andrew Rybchenko <[email protected]>

INTERNAL: net/virtio: check Xilinx limit on MSS on prepare

30ae6c7

Xilinx virtio-net does not supports MSS smaller than 512. Signed-off-by: Andrew Rybchenko <[email protected]>

INTERNAL: net/virtio: check Xilinx limit on a number of segments

a1ef17a

Xilinx virtio-net dies if too many segments are requested. 185 is an experimental number. Signed-off-by: Andrew Rybchenko <[email protected]>

INTERNAL: net/sfc: allow unconfigured NIC

808f80f

NIC is unconfigured when CMOD is used. Signed-off-by: Andrew Rybchenko <[email protected]>

app/testpmd: add command-line to set all ports loopback mode

ef30038

Signed-off-by: Igor Romanov <[email protected]>

INTERNAL: net/sfc: DEBUG: log assertion failure on scatter packet

e1d3ae5

Some iterations of recent performance tests report assertion failure which does not give enough information. Log the RxQ state right before the assert.

INTERNAL: version: 22.03.0-rc15

5affd0e

Release 15 is used to identify next-sfc. Signed-off-by: Andrew Rybchenko <[email protected]>

vdpa/sfc: enable support for multi-queue

ee58f63

Increase the number to defaut RX/TX queue pairs to 8, and add MQ feature flag to vDPA protocol features. Signed-off-by: Abhimanyu Saini <[email protected]>

common/sfc_efx/base: update MCDI headers

e0ab1fd

Regenerate MCDI headers from smartnic_registry:72940ad Signed-off-by: Abhimanyu Saini <[email protected]>

ol-andrewr approved these changes Jun 15, 2022

View reviewed changes

ol-andrewr reviewed Jun 15, 2022

View reviewed changes

ol-andrewr approved these changes Jun 15, 2022

View reviewed changes

Abhimanyu Saini added 4 commits July 15, 2022 11:13

asaini-xilinx force-pushed the main branch from 3736e32 to c957571 Compare July 18, 2022 06:08

okt-galaktionov force-pushed the main branch from ca463c1 to 05eedd4 Compare October 21, 2022 15:54

okt-galaktionov force-pushed the main branch from 05eedd4 to aa69f3e Compare November 9, 2022 14:06

okt-galaktionov force-pushed the main branch 2 times, most recently from 69cb0c6 to 76e499c Compare December 1, 2022 15:51

okt-galaktionov force-pushed the main branch 2 times, most recently from 0d9a6a2 to 8fe7daf Compare December 8, 2022 13:37

okt-galaktionov force-pushed the main branch from 8fe7daf to d9565a8 Compare January 30, 2023 10:41

okt-galaktionov force-pushed the main branch from d9565a8 to 39a7d16 Compare February 10, 2023 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for some sync issues b/w qemu and dpdk-vdpa #1

Fix for some sync issues b/w qemu and dpdk-vdpa #1

asaini-xilinx commented Jun 15, 2022

ol-andrewr Jun 15, 2022

ol-andrewr left a comment

ol-andrewr Jun 15, 2022

ol-andrewr left a comment

Fix for some sync issues b/w qemu and dpdk-vdpa #1

Are you sure you want to change the base?

Fix for some sync issues b/w qemu and dpdk-vdpa #1

Conversation

asaini-xilinx commented Jun 15, 2022

ol-andrewr Jun 15, 2022

Choose a reason for hiding this comment

ol-andrewr left a comment

Choose a reason for hiding this comment

ol-andrewr Jun 15, 2022

Choose a reason for hiding this comment

ol-andrewr left a comment

Choose a reason for hiding this comment