You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.
If one follows the instructions on how to integrate LST in CMSSW and run the step3 with more than 1 threads/streams, a segmentation violation happens:
A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.
Fri Jul 28 12:58:39 PDT 2023
Thread 6 (Thread 0x7fc0b623a700 (LWP 2501614) "cmsRun"):
#0 0x00007fc196191d96 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1 0x00007fc196191e88 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2 0x00007fc1847f4812 in ?? () from /lib64/libcuda.so.1
#3 0x00007fc184804b98 in ?? () from /lib64/libcuda.so.1
#4 0x00007fc1961891cf in start_thread () from /lib64/libpthread.so.0
#5 0x00007fc195df5dd3 in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fc11e0f1700 (LWP 2501363) "cmsRun"):
#0 0x00007fc195eb6658 in nanosleep () from /lib64/libc.so.6
#1 0x00007fc195eb655e in sleep () from /lib64/libc.so.6
#2 0x00007fc18f0ee360 in sig_pause_for_stacktrace () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3 <signal handler called>
#4 0x00007fc0edcfc292 in __gnu_cxx::__aligned_membuf<std::pair<unsigned int const, float> >::_M_ptr (this=0x7fc09be363a0) at /cvmfs/cms.cern.ch/el8_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/ext/aligned_buffer.h:77
#5 0x00007fc0edcfc116 in std::_Rb_tree_node<std::pair<unsigned int const, float> >::_M_valptr (this=0x7fc09be36380) at /cvmfs/cms.cern.ch/el8_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/bits/stl_tree.h:239
#6 0x00007fc0edcfbc73 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, float>, std::_Select1st<std::pair<unsigned int const, float> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, float> > >::_S_key (__x=0x7fc09be36380) at /cvmfs/cms.cern.ch/el8_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/bits/stl_tree.h:785
#7 0x00007fc0edcfbe36 in std::_Rb_tree<unsigned int, std::pair<unsigned int const, float>, std::_Select1st<std::pair<unsigned int const, float> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, float> > >::_M_lower_bound (this=0x7fc09c223c50, __x=0x7fc09be36380, __y=0x7fc09b1450d0, __k=@0x7fc11e0ea0dc: 442241098) at /cvmfs/cms.cern.ch/el8_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/bits/stl_tree.h:1935
#8 0x00007fc0edcfbb7d in std::_Rb_tree<unsigned int, std::pair<unsigned int const, float>, std::_Select1st<std::pair<unsigned int const, float> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, float> > >::lower_bound (this=0x7fc09c223c50, __k=@0x7fc11e0ea0dc: 442241098) at /cvmfs/cms.cern.ch/el8_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/bits/stl_tree.h:1277
#9 0x00007fc0edcfb865 in std::map<unsigned int, float, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, float> > >::lower_bound (this=0x7fc09c223c50, __x=@0x7fc11e0ea0dc: 442241098) at /cvmfs/cms.cern.ch/el8_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/bits/stl_map.h:1259
#10 0x00007fc0edcfb5cc in std::map<unsigned int, float, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, float> > >::operator[] (this=0x7fc09c223c50, __k=@0x7fc11e0ea0dc: 442241098) at /cvmfs/cms.cern.ch/el8_amd64_gcc10/external/gcc/10.3.0-84898dea653199466402e67d73657f10/include/c++/10.3.0/bits/stl_map.h:497
#11 0x00007fc0edd10ed1 in SDL::loadModulesFromFile (modulesInGPU=..., nModules=@0x7fc0ee5c67b8: 26401, nLowerModules=@0x7fc0ee5c67ba: 13200, pixelMapping=..., stream=0x0, moduleMetaDataFilePath=0x7fc09a8dca00 "/home/users/evourlio/LSTinCMSSW/cgpu-1/CMSSW_13_0_0_pre4/src/../../../TrackLooper/data/centroid_CMSSW_12_2_0_pre2.txt") at Module.cu:350
#12 0x00007fc0edcff622 in SDL::initModules (moduleMetaDataFilePath=0x7fc09a8dca00 "/home/users/evourlio/LSTinCMSSW/cgpu-1/CMSSW_13_0_0_pre4/src/../../../TrackLooper/data/centroid_CMSSW_12_2_0_pre2.txt") at Event.cu:490
#13 0x00007fc0edceff84 in SDL::LST::eventSetup (this=0x7fc101f59918) at LST.cc:10
#14 0x00007fc0fe03bc0a in alpaka_cuda_async::LSTProducer::acquire(edm::Event const&, edm::EventSetup const&, edm::WaitingTaskWithArenaHolder) () from /home/users/evourlio/LSTinCMSSW/cgpu-1/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/pluginRecoTrackerLSTPluginsPortableCudaAsync.so
#15 0x00007fc198a39298 in edm::stream::doAcquireIfNeeded(edm::stream::impl::ExternalWork*, edm::Event const&, edm::EventSetup const&, edm::WaitingTaskWithArenaHolder&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#16 0x00007fc198a3779a in edm::stream::EDProducerAdaptorBase::doAcquire(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::WaitingTaskWithArenaHolder&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#17 0x00007fc198a0ace9 in edm::Worker::runAcquire(edm::EventTransitionInfo const&, edm::ParentContext const&, edm::WaitingTaskWithArenaHolder&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#18 0x00007fc198a0ae7e in edm::Worker::runAcquireAfterAsyncPrefetch(std::__exception_ptr::exception_ptr, edm::EventTransitionInfo const&, edm::ParentContext const&, edm::WaitingTaskWithArenaHolder) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#19 0x00007fc19896d114 in edm::Worker::AcquireTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>, void>::execute() () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#20 0x00007fc198b689f9 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreConcurrency.so
#21 0x00007fc197042304 in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7fc193dbd300, waiter=..., this=0x7fc193ed7e80) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_0_pre4-el8_amd64_gcc11/build/CMSSW_13_0_0_pre4-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-0282c02a966e31ef3a1f3b1a4ea0f8fa/tbb-v2021.8.0/src/tbb/task_dispatcher.h:322
#22 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7fc193ed7e80) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_0_pre4-el8_amd64_gcc11/build/CMSSW_13_0_0_pre4-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-0282c02a966e31ef3a1f3b1a4ea0f8fa/tbb-v2021.8.0/src/tbb/task_dispatcher.h:458
#23 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_0_pre4-el8_amd64_gcc11/build/CMSSW_13_0_0_pre4-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-0282c02a966e31ef3a1f3b1a4ea0f8fa/tbb-v2021.8.0/src/tbb/arena.cpp:137
#24 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_0_pre4-el8_amd64_gcc11/build/CMSSW_13_0_0_pre4-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-0282c02a966e31ef3a1f3b1a4ea0f8fa/tbb-v2021.8.0/src/tbb/market.cpp:599
#25 0x00007fc1970444c6 in tbb::detail::r1::rml::private_worker::run (this=0x7fc19172a100) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_0_pre4-el8_amd64_gcc11/build/CMSSW_13_0_0_pre4-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-0282c02a966e31ef3a1f3b1a4ea0f8fa/tbb-v2021.8.0/src/tbb/private_server.cpp:271
#26 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fc19172a100) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_0_pre4-el8_amd64_gcc11/build/CMSSW_13_0_0_pre4-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-0282c02a966e31ef3a1f3b1a4ea0f8fa/tbb-v2021.8.0/src/tbb/private_server.cpp:221
#27 0x00007fc1961891cf in start_thread () from /lib64/libpthread.so.0
#28 0x00007fc195df5dd3 in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fc14ffff700 (LWP 2501340) "cuda-EvtHandlr"):
#0 0x00007fc195ee0ac1 in poll () from /lib64/libc.so.6
#1 0x00007fc184809b89 in ?? () from /lib64/libcuda.so.1
#2 0x00007fc1848b0d7b in ?? () from /lib64/libcuda.so.1
#3 0x00007fc184804b98 in ?? () from /lib64/libcuda.so.1
#4 0x00007fc1961891cf in start_thread () from /lib64/libpthread.so.0
#5 0x00007fc195df5dd3 in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fc16142b700 (LWP 2501339) "cuda-EvtHandlr"):
#0 0x00007fc195ee0ac1 in poll () from /lib64/libc.so.6
#1 0x00007fc184809b89 in ?? () from /lib64/libcuda.so.1
#2 0x00007fc1848b0d7b in ?? () from /lib64/libcuda.so.1
#3 0x00007fc184804b98 in ?? () from /lib64/libcuda.so.1
#4 0x00007fc1961891cf in start_thread () from /lib64/libpthread.so.0
#5 0x00007fc195df5dd3 in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fc167cdb700 (LWP 2501323) "cmsRun"):
#0 0x00007fc196193662 in waitpid () from /lib64/libpthread.so.0
#1 0x00007fc18f0ee517 in edm::service::cmssw_stacktrace_fork() () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#2 0x00007fc18f0ef0ca in edm::service::InitRootHandlers::stacktraceHelperThread() () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3 0x00007fc1968179b4 in std::execute_native_thread_routine (__p=0x7fc192c60590) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#4 0x00007fc1961891cf in start_thread () from /lib64/libpthread.so.0
#5 0x00007fc195df5dd3 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fc195318640 (LWP 2501182) "cmsRun"):
#0 0x00007fc195ee0ac1 in poll () from /lib64/libc.so.6
#1 0x00007fc18f0ee80f in full_read.constprop () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#2 0x00007fc18f0ef19c in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#3 0x00007fc18f0f1b1b in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/pluginFWCoreServicesPlugins.so
#4 <signal handler called>
#5 0x00007fc0edd03cac in SDL::Event::createPixelQuintuplets (this=0x7ffc89a705c0) at Event.cu:1360
#6 0x00007fc0edcebf43 in SDL::LST::run (this=0x7fc101f59318, stream=<optimized out>, verbose=<optimized out>, see_px=..., see_py=..., see_pz=..., see_dxy=..., see_dz=..., see_ptErr=..., see_etaErr=..., see_stateTrajGlbX=..., see_stateTrajGlbY=..., see_stateTrajGlbZ=..., see_stateTrajGlbPx=..., see_stateTrajGlbPy=..., see_stateTrajGlbPz=..., see_q=..., see_hitIdx=..., ph2_detId=..., ph2_x=..., ph2_y=..., ph2_z=...) at LST.cc:140
#7 0x00007fc0fe03c93a in alpaka_cuda_async::LSTProducer::acquire(edm::Event const&, edm::EventSetup const&, edm::WaitingTaskWithArenaHolder) () from /home/users/evourlio/LSTinCMSSW/cgpu-1/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/pluginRecoTrackerLSTPluginsPortableCudaAsync.so
#8 0x00007fc198a39298 in edm::stream::doAcquireIfNeeded(edm::stream::impl::ExternalWork*, edm::Event const&, edm::EventSetup const&, edm::WaitingTaskWithArenaHolder&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#9 0x00007fc198a3779a in edm::stream::EDProducerAdaptorBase::doAcquire(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*, edm::WaitingTaskWithArenaHolder&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#10 0x00007fc198a0ace9 in edm::Worker::runAcquire(edm::EventTransitionInfo const&, edm::ParentContext const&, edm::WaitingTaskWithArenaHolder&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#11 0x00007fc198a0ae7e in edm::Worker::runAcquireAfterAsyncPrefetch(std::__exception_ptr::exception_ptr, edm::EventTransitionInfo const&, edm::ParentContext const&, edm::WaitingTaskWithArenaHolder) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#12 0x00007fc19896d114 in edm::Worker::AcquireTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>, void>::execute() () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#13 0x00007fc198b689f9 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreConcurrency.so
#14 0x00007fc1970499cd in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7fc100e5b200, this=0x7fc193ed7e00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_0_pre4-el8_amd64_gcc11/build/CMSSW_13_0_0_pre4-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-0282c02a966e31ef3a1f3b1a4ea0f8fa/tbb-v2021.8.0/src/tbb/task_dispatcher.h:322
#15 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=0x7fc193ed7e00) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_0_pre4-el8_amd64_gcc11/build/CMSSW_13_0_0_pre4-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-0282c02a966e31ef3a1f3b1a4ea0f8fa/tbb-v2021.8.0/src/tbb/task_dispatcher.h:458
#16 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_0_pre4-el8_amd64_gcc11/build/CMSSW_13_0_0_pre4-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-0282c02a966e31ef3a1f3b1a4ea0f8fa/tbb-v2021.8.0/src/tbb/task_dispatcher.cpp:168
#17 0x00007fc1988ec40d in edm::FinalWaitingTask::wait() () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#18 0x00007fc1988d4211 in edm::EventProcessor::processRuns() () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#19 0x00007fc1988e0dc6 in edm::EventProcessor::runToCompletion() () from /cvmfs/cms.cern.ch/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_0_pre4/lib/el8_amd64_gcc11/libFWCoreFramework.so
#20 0x000000000040a1bd in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#21 0x00007fc197037847 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/auto-builds/CMSSW_13_0_0_pre4-el8_amd64_gcc11/build/CMSSW_13_0_0_pre4-build/BUILD/el8_amd64_gcc11/external/tbb/v2021.8.0-0282c02a966e31ef3a1f3b1a4ea0f8fa/tbb-v2021.8.0/src/tbb/arena.cpp:694
#22 0x000000000040b009 in main::{lambda()#1}::operator()() const ()
#23 0x000000000040971c in main ()
Current Modules:
Module: alpaka_cuda_async::LSTProducer:lstProducer (crashed)
Module: alpaka_cuda_async::LSTProducer:lstProducer
A fatal system signal has occurred: segmentation violation
Segmentation fault (core dumped)
The text was updated successfully, but these errors were encountered:
for the multithreading case, I think that the reason is already known: loadModulesFromFile is executed per event and writes to non-const globals.
IIUC, there are more uses of globals in other places; chances are they will crash as well once the very slow loadModulesFromFile method is made safe.
More details are in #287 item 3. and later in the discussion
If one follows the instructions on how to integrate LST in CMSSW and run the step3 with more than 1 threads/streams, a segmentation violation happens:
The text was updated successfully, but these errors were encountered: