Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] fix be core in highly concurrent queries #47410

Open
3 tasks done
yongjinhou opened this issue Jan 24, 2025 · 1 comment · May be fixed by #47411
Open
3 tasks done

[Bug] fix be core in highly concurrent queries #47410

yongjinhou opened this issue Jan 24, 2025 · 1 comment · May be fixed by #47411

Comments

@yongjinhou
Copy link
Contributor

yongjinhou commented Jan 24, 2025

Search before asking

  • I had searched in the issues and found no similar issues.

Version

3.0.3 & master

What's Wrong?

be core in highly concurrent queries

原因分析:异步执行task中的变量被提前释放了,比如有2个pipeline任务A,B,A进队列成功,B进队列失败, B的失败,导致处理逻辑返回,A中的捕获的变量提前销毁(query_ctx), 当要线程池执行A的时候出core
解法: 本质原因还是高并发下队列满了,这个队列是一个共享队列,既要处理fragment任务又要处理prepare任务,高并发下fragment任务多了,prepare任务进队列就会失败,所以把线程池做拆分即可,prepare单独开辟一个线程池,同时prepare的逻辑是要保证都处理成功的,这里用了一个阻塞队列的实现,当队列满时,enqueue阻塞,保证prepare任务都能

be core栈(#6中query_ctx为空指针)
(gdb) bt #0 0x00007f3293634a2a in pthread_sigmask () from /lib64/libpthread.so.0 #1 0x00007f3295425f8b in PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] () from /home/users/hyj/mywork/jdk-17.0.11/lib/server/libjvm.so #2 0x00007f3295426a6e in JVM_handle_linux_signal () from /home/users/hyj/mywork/jdk-17.0.11/lib/server/libjvm.so #3 <signal handler called> #4 doris::TUniqueId::TUniqueId (this=0x7f180a8e3958, other51=...) at /home/users/hyj/mywork/Palo/baidu/third-party/palo/gensrc/build/gen_cpp/Types_types.cpp:2509 #5 0x000055764179243e in doris::QueryContext::query_id (this=0x0) at /home/users/hyj/mywork/Palo/baidu/third-party/palo/be/src/runtime/query_context.h:174 #6 doris::AttachTask::AttachTask (this=<optimized out>, query_ctx=0x0) at /home/users/hyj/mywork/Palo/baidu/third-party/palo/be/src/runtime/thread_context.cpp:60 #7 0x000055764ae631b2 in doris::pipeline::PipelineFragmentContext::_build_pipeline_tasks(doris::TPipelineFragmentParams const&, doris::ThreadPool*)::$_1::operator()() const (this=0x7f1b0d833040) at /home/users/hyj/mywork/Palo/baidu/third-party/palo/be/src/pipeline/pipeline_fragment_context.cpp:515 #8 std::__invoke_impl<void, doris::pipeline::PipelineFragmentContext::_build_pipeline_tasks(doris::TPipelineFragmentParams const&, doris::ThreadPool*)::$_1&>(std::__invoke_other, doris::pipeline::PipelineFragmentContext::_build_pipeline_tasks(doris::TPipelineFragmentParams const&, doris::ThreadPool*)::$_1&) (__f=...) at /home/users/palo/develop_env/ldb_toolchain_set/ldb_toolchain_v0.17/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61 #9 std::__invoke_r<void, doris::pipeline::PipelineFragmentContext::_build_pipeline_tasks(doris::TPipelineFragmentParams const&, doris::ThreadPool*)::$_1&>(doris::pipeline::PipelineFragmentContext::_bui--Type <RET> for more, q to quit, c to continue without paging-- ld_pipeline_tasks(doris::TPipelineFragmentParams const&, doris::ThreadPool*)::$_1&) (__fn=...) at /home/users/palo/develop_env/ldb_toolchain_set/ldb_toolchain_v0.17/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111 #10 std::_Function_handler<void (), doris::pipeline::PipelineFragmentContext::_build_pipeline_tasks(doris::TPipelineFragmentParams const&, doris::ThreadPool*)::$_1>::_M_invoke(std::_Any_data const&) ( __functor=...) at /home/users/palo/develop_env/ldb_toolchain_set/ldb_toolchain_v0.17/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291 #11 0x0000557641925428 in doris::ThreadPool::dispatch_thread (this=0x7f3200714c00) at /home/users/hyj/mywork/Palo/baidu/third-party/palo/be/src/util/threadpool.cpp:543 #12 0x000055764191a3a1 in std::function<void ()>::operator()() const (this=0x7f180a8e3958) at /home/users/palo/develop_env/ldb_toolchain_set/ldb_toolchain_v0.17/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560 #13 doris::Thread::supervise_thread (arg=0x7f3209620b00) at /home/users/hyj/mywork/Palo/baidu/third-party/palo/be/src/util/thread.cpp:498 #14 0x00007f329362fea5 in start_thread () from /lib64/libpthread.so.0 #15 0x00007f329405eb0d in clone () from /lib64/libc.so.6

What You Expected?

be正常执行查询

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@yongjinhou yongjinhou linked a pull request Jan 24, 2025 that will close this issue
16 tasks
@Gabriel39
Copy link
Contributor

Could you show the complete core stack?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants