Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for some sync issues b/w qemu and dpdk-vdpa #1

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

asaini-xilinx
Copy link
Contributor

vdpa/sfc: resolve race between libvhost and sfc_vdpa_pmd

For vhost-user backend, qemu sets callfd twice for each VQ.
Once when it opens vdpa-socket and thenn again when it starts
VQ(s)

To start VQ(s) qemu issues the following messages for all VQ(s):
VHOST_USER_SET_VRING_NUM
VHOST_USER_SET_VRING_BASE
VHOST_USER_SET_VRING_ADDR
VHOST_USER_SET_VRING_KICK
VHOST_USER_SET_VRING_CALL (... this overwrites the old callfd)

After processing each message dpdk-libvhost checks vq_is_ready(),
and if vq_is_ready() returns true it calls vdpa_dev->ops->dev_conf()

static bool vq_is_ready()
{
...
    ring_ok = vq->desc && vq->avail && vq->used;
...
    return ring_ok &&
            vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
            vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD &&
            vq->enabled;
}

And after processing VHOST_USER_SET_VRING_KICK message for the
last VQ vq_is_ready() return true b/c vq->callfd contains the old
value. So dev_conf is called before VHOST_USER_SET_VRING_CALL can be
processed for the last VQ, and VHOST_USER_SET_VRING_CALL message for
the last VQ is processed after dev_conf completes.

sfc_vdpa_pmd's dev_conf spawns a thread to set rte_vhost_host_notifier_ctrl()
before returning control to libvhost. This parallel thread in turn invokes
get_notify_area(). To get the notify_area, sfc_vdpa_pmd needs to
query the HW and for this query it needs an enabled VQ.

But at the same time libvhost is trying reset the callfd for the last VQ
to the new value and to do this it momentarily disables that VQ causing
a race b/w the processing VHOST_USER_SET_VRING_CALL and setting
rte_vhost_host_notifier_ctrl()

To resolve this race condition, query the HW and cache notify_area
inside dev_conf() instead of the parallel thread.

vdpa/sfc: handle sync issue between qemu and vhost-user

When DPDK app is running in the VF, it sometimes rings the doorbell
before sfc_vdpa_dev_config has had a chance to complete and hence it
misses the event. As workaround, ring the doorbell when vDPA reports
the notify_area to QEMU.

Signed-off-by: Vijay Kumar Srivastava <[email protected]>
Signed-off-by: Abhimanyu Saini <[email protected]>

ol-andrewr and others added 16 commits December 7, 2021 13:32
Failsafe driver has been unable to configure some sub-devices which
don't support disabling Rx drop after device info reporting had been
reconsidered.  Suggest enabling Rx drop by default once at least one
sub-device is found to advertise this setting.

Fixes: 4586be3 ("net/failsafe: fix reported device info")
Cc: Stephen Hemminger <[email protected]>
Cc: [email protected]

Signed-off-by: Ivan Malov <[email protected]>
Xilinx virtio-net does not supports MSS smaller than 512.

Signed-off-by: Andrew Rybchenko <[email protected]>
Xilinx virtio-net dies if too many segments are requested.
185 is an experimental number.

Signed-off-by: Andrew Rybchenko <[email protected]>
Setting exact link speed makes sense if auto-negotiation is
disabled. Fixed flag is required to disable auto-negotiation.

Fixes: 88fbedc ("app/testpmd: move speed and duplex parsing in a function")
Cc: [email protected]

Signed-off-by: Andrew Rybchenko <[email protected]>
Acked-by: Bernard Iremonger <[email protected]>
NIC is unconfigured when CMOD is used.

Signed-off-by: Andrew Rybchenko <[email protected]>
Some iterations of recent performance tests report assertion failure
which does not give enough information. Log the RxQ state right before
the assert.
Release 15 is used to identify next-sfc.

Signed-off-by: Andrew Rybchenko <[email protected]>
When DPDK app is running in the VF, it sometimes rings the doorbell
before sfc_vdpa_dev_config has had a chance to complete and hence it
misses the event. As workaround, ring the doorbell when vDPA reports
the notify_area to QEMU.

Signed-off-by: Vijay Kumar Srivastava <[email protected]>
Signed-off-by: Abhimanyu Saini <[email protected]>
Increase the number to defaut RX/TX queue pairs to 8,
and add MQ feature flag to vDPA protocol features.

Signed-off-by: Abhimanyu Saini <[email protected]>
Buffer for MCDI channel is allocated using rte_memzone_reserve_aligned
with zone name 'mcdi'. Since multiple MCDI channels are needed to
support multiple VF(s) and rte_memzone_reserve_aligned expects unique
zone names, append pciaddr to zone name to make it unique.

Signed-off-by: Abhimanyu Saini <[email protected]>
The used/avail queue indexes are not bound by queue
size, because the descriptor entry index is calculated
by a simple modulo between queue index and queue_size

So, do not check initial used and avail queue indexes
against queue size because it is possible for these
indexes to be greater than queue size in the
following cases:
1) The queue is created to be migrated into, or
2) The client issues a qstop/qstart after running datapath

Signed-off-by: Abhimanyu Saini <[email protected]>
Regenerate MCDI headers from smartnic_registry:72940ad

Signed-off-by: Abhimanyu Saini <[email protected]>
Change cidx and pidx definition to mean used queue and avail
queue index respectively.

Also, remove the cached copies of cidx/pidx in vq_cxt as
they are never updated. Invoke rte_vhost_set_vring_base()
instead.

Signed-off-by: Abhimanyu Saini <[email protected]>
@@ -856,6 +858,19 @@ sfc_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size)
sfc_vdpa_info(dev, "vDPA ops get_notify_area :: offset : 0x%" PRIx64,
*offset);

pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev;
doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr +
bar_offset + reg.offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use already calculated *offset here?

Copy link
Contributor

@ol-andrewr ol-andrewr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more point. I think it is a fix which should be backported to stable branches. So, add "Fixes: " tag with reference to earlier changeset to be fixed and add:
Cc: [email protected]

goto fail_virtio_qstart;
}

sfc_vdpa_info(ops_data->dev_handle,
"virtqueue started successfully for vq_num %d", vq_num);

rc = efx_virtio_get_doorbell_offset(vq, &doorbell);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there are two spaces before &doorbeel.

Copy link
Contributor

@ol-andrewr ol-andrewr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please, avoid sfc_vdpa_pmd in the description. Just name it "driver"

b/w .. -> between the libvhost and the driver.

Abhimanyu Saini added 4 commits July 15, 2022 11:13
In SW assisted live migration, vDPA driver will stop all virtqueues
and setup up SW vrings to relay the communication between the
virtio driver and the vDPA device using an event driven relay thread

This will allow vDPA driver to help on guest dirty page logging for
live migration.

Signed-off-by: Abhimanyu Saini <[email protected]>
vDPA driver allocates a MCDI buffer and maps it for DMA in the IO address
space of the VF, and guest virtio driver independently allocates virtqueues
and uses their corresponding guest IOVA(s). There is no guarantee that the
values for these two addresses will be unique and vDPA might need to
relocate mcdi IOVA.

The patch adds libefx API to handle mcdi IOVA remapping.

Signed-off-by: Abhimanyu Saini <[email protected]>
MCDI IOVA unmap/remap is handled by client driver, add
dma_remap callback to mcdi_ops so that libefx can pass
the control to client driver and move the MCDI IOVA to
a new address.

Signed-off-by: Abhimanyu Saini <[email protected]>
vDPA driver allocates a MCDI buffer and maps it for DMA in the IO address
space of the virtual function, and the virtio-net driver independently
allocates virtqueues and uses their corresponding guest IOVA(s). There is
no guarantee that the values for these two addresses will be unique and
vDPA might need to relocate mcdi IOVA.

To resolves the problem of overlap between IOVA(s) allocated by vDPA and
qemu, three changes have been made to the vDPA driver.

1) Cache all known IOVA(s) in a linked list and add functions
   to check overlap and resolve IOVA opverlaps.
2) Check for overlap
3) If overlap is found, find a new IOVA and invoke efx_mcdi_dma_remap

Signed-off-by: Abhimanyu Saini <[email protected]>
okt-galaktionov pushed a commit that referenced this pull request Oct 21, 2022
If DPDK is built with thread sanitizer it reports a race
in setting of multiprocess file descriptor. The fix is to
use atomic operations when updating mp_fd.

Build:
$ meson -Db_sanitize=address build
$ ninja -C build

Simple example:
$ .build/app/dpdk-testpmd -l 1-3 --no-huge
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 1
EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
testpmd: No probed ethernet devices
testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
EAL: Error - exiting with code: 1
  Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory
==================
WARNING: ThreadSanitizer: data race (pid=87245)
  Write of size 4 at 0x558e04d8ff70 by main thread:
    #0 rte_mp_channel_cleanup <null> (dpdk-testpmd+0x1e7d30c)
    #1 rte_eal_cleanup <null> (dpdk-testpmd+0x1e85929)
    #2 rte_exit <null> (dpdk-testpmd+0x1e5bc0a)
    #3 mbuf_pool_create.cold <null> (dpdk-testpmd+0x274011)
    #4 main <null> (dpdk-testpmd+0x5cc15d)

  Previous read of size 4 at 0x558e04d8ff70 by thread T2:
    #0 mp_handle <null> (dpdk-testpmd+0x1e7c439)
    #1 ctrl_thread_init <null> (dpdk-testpmd+0x1e6ee1e)

  As if synchronized via sleep:
    #0 nanosleep libsanitizer/tsan/tsan_interceptors_posix.cpp:366
    #1 get_tsc_freq <null> (dpdk-testpmd+0x1e92ff9)
    #2 set_tsc_freq <null> (dpdk-testpmd+0x1e6f2fc)
    #3 rte_eal_timer_init <null> (dpdk-testpmd+0x1e931a4)
    #4 rte_eal_init.cold <null> (dpdk-testpmd+0x29e578)
    #5 main <null> (dpdk-testpmd+0x5cbc45)

  Location is global 'mp_fd' of size 4 at 0x558e04d8ff70 (dpdk-testpmd+0x000003122f70)

  Thread T2 'rte_mp_handle' (tid=87248, running) created by main thread at:
    #0 pthread_create libsanitizer/tsan/tsan_interceptors_posix.cpp:969
    #1 rte_ctrl_thread_create <null> (dpdk-testpmd+0x1e6efd0)
    #2 rte_mp_channel_init.cold <null> (dpdk-testpmd+0x29cb7c)
    #3 rte_eal_init <null> (dpdk-testpmd+0x1e8662e)
    #4 main <null> (dpdk-testpmd+0x5cbc45)

SUMMARY: ThreadSanitizer: data race (app/dpdk-testpmd+0x1e7d30c) in rte_mp_channel_cleanup
==================
ThreadSanitizer: reported 1 warnings

Fixes: bacaa27 ("eal: add channel for multi-process communication")
Cc: [email protected]

Signed-off-by: Stephen Hemminger <[email protected]>
Acked-by: Anatoly Burakov <[email protected]>
Reviewed-by: Chengwen Feng <[email protected]>
@okt-galaktionov okt-galaktionov force-pushed the main branch 2 times, most recently from 69cb0c6 to 76e499c Compare December 1, 2022 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants