Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](sort) fix coredump by uncaught exception。 DO NOT MERGE #46952

Closed

Conversation

jacktengg
Copy link
Contributor

@jacktengg jacktengg commented Jan 14, 2025

NOT MERGE this PR

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

coredump:

Query id: 8a5e4545d3cc4fee-8b5c7c7a40552fe2 ***
tablet id: 0 ***
Aborted at 1731121806 (unix time) try "date -d @1731121806" if you are using GNU date ***
Current BE git commitID: 856270f167 ***
SIGABRT unknown detail explain (@0x3e8000f1daf) received by PID 990639 (TID 2743004 OR 0x7fd0b69c6700) from PID 990639; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/selectdb-core/be/src/common/signal_handler.h:435
1# 0x00007FD86B4AB400 in /lib64/libc.so.6
2# __GI_raise in /lib64/libc.so.6
3# __GI_abort in /lib64/libc.so.6
4# _gnu_cxx::_verbose_terminate_handler() [clone .cold] at ../../../../libstdc+-v3/libsupc+/vterminate.cc:75
5# _cxxabiv1::_terminate(void ()) at ../../../../libstdc+-v3/libsupc+/eh_terminate.cc:48
6# 0x000055A693ED7F01 in /opt/selectdb/3.0.10.3/be/lib/doris_be
7# 0x000055A693ED8054 in /opt/selectdb/3.0.10.3/be/lib/doris_be
8# doris::vectorized::BlockSupplierSortCursorImpl::has_next_block() in /opt/selectdb/3.0.10.3/be/lib/doris_be
9# doris::vectorized::VSortedRunMerger::has_next_block(doris::vectorized::MergeSortCursor&) in /opt/selectdb/3.0.10.3/be/lib/doris_be
10# doris::vectorized::VSortedRunMerger::get_next(doris::vectorized::Block*, bool*) at /root/selectdb-core/be/src/vec/runtime/vsorted_run_merger.cpp:193
11# doris::vectorized::VDataStreamRecvr::get_next(doris::vectorized::Block*, bool*) in /opt/selectdb/3.0.10.3/be/lib/doris_be
12# doris::vectorized::VExchangeNode::get_next(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/selectdb-core/be/src/vec/exec/vexchange_node.cpp:107
13# std::_Function_handler<doris::Status (doris::RuntimeState*, doris::vectorized::Block*, bool*), std::_Bind<doris::Status (doris::ExecNode::(doris::ExecNode, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(doris::RuntimeState*, doris::vectorized::Block*, bool*)> >::_M_invoke(std::_Any_data const&, doris::RuntimeState*&&, doris::vectorized::Block*&&, bool*&&) at /root/tools/ldb-16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
14# doris::ExecNode::get_next_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*, std::function<doris::Status (doris::RuntimeState*, doris::vectorized::Block*, bool*)> const&, bool) at /root/selectdb-core/be/src/exec/exec_node.cpp:597
15# doris::PlanFragmentExecutor::get_vectorized_internal(doris::vectorized::Block*, bool*) at /root/selectdb-core/be/src/runtime/plan_fragment_executor.cpp:356
16# doris::PlanFragmentExecutor::open_vectorized_internal() in /opt/selectdb/3.0.10.3/be/lib/doris_be
17# doris::PlanFragmentExecutor::open() at /root/selectdb-core/be/src/runtime/plan_fragment_executor.cpp:265
18# doris::FragmentExecState::execute() at /root/selectdb-core/be/src/runtime/fragment_mgr.cpp:281
19# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /root/selectdb-core/be/src/runtime/fragment_mgr.cpp:554
20# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0>::_M_invoke(std::_Any_data const&) at /root/tools/ldb-16/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
21# doris::ThreadPool::dispatch_thread() in /opt/selectdb/3.0.10.3/be/lib/doris_be
22# doris::Thread::supervise_thread(void*) at /root/selectdb-core/be/src/util/thread.cpp:499
23# start_thread in /lib64/libpthread.so.0
24# _GI__clone in /lib64/libc.so.6

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 14, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@jacktengg jacktengg force-pushed the 2.0-fix-sort-exception branch from 04cf8ff to f96eaa9 Compare January 14, 2025 03:34
@yiguolei
Copy link
Contributor

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 14, 2025
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.54% (8308/21556)
Line Coverage: 30.24% (68705/227218)
Region Coverage: 29.67% (35389/119287)
Branch Coverage: 25.43% (18191/71532)
Coverage Report: http://coverage.selectdb-in.cc/coverage/f96eaa9e0f4507e0bcfd14c320c8984c5c943614_f96eaa9e0f4507e0bcfd14c320c8984c5c943614/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 49426 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f96eaa9e0f4507e0bcfd14c320c8984c5c943614, data reload: false

------ Round 1 ----------------------------------
q1	17752	4363	4352	4352
q2	2068	159	155	155
q3	10249	1887	1887	1887
q4	10344	1273	1340	1273
q5	8399	3902	3861	3861
q6	238	123	122	122
q7	2025	1652	1616	1616
q8	9517	2728	2711	2711
q9	10121	10031	9875	9875
q10	8677	3580	3569	3569
q11	422	250	256	250
q12	473	300	302	300
q13	19304	4007	4047	4007
q14	348	322	320	320
q15	507	470	458	458
q16	526	486	455	455
q17	1130	972	981	972
q18	7281	6840	6908	6840
q19	1707	1609	1538	1538
q20	549	323	308	308
q21	4446	4309	4140	4140
q22	534	419	417	417
Total cold run time: 116617 ms
Total hot run time: 49426 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4454	4434	4452	4434
q2	362	266	251	251
q3	4283	4257	4244	4244
q4	2805	2857	2841	2841
q5	7380	7162	7194	7162
q6	248	123	123	123
q7	3271	2871	2799	2799
q8	4415	4484	4508	4484
q9	13746	13584	13517	13517
q10	4201	4232	4244	4232
q11	780	703	689	689
q12	1025	823	844	823
q13	7317	3754	3755	3754
q14	456	426	420	420
q15	499	469	449	449
q16	631	596	608	596
q17	3837	3865	3815	3815
q18	8826	8730	8795	8730
q19	1724	1663	1662	1662
q20	2360	2119	2122	2119
q21	8559	8380	8609	8380
q22	1015	876	948	876
Total cold run time: 82194 ms
Total hot run time: 76400 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 212972 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f96eaa9e0f4507e0bcfd14c320c8984c5c943614, data reload: false

query1	943	421	384	384
query2	6525	2180	2203	2180
query3	6928	200	198	198
query4	23317	21462	21707	21462
query5	19757	6489	6499	6489
query6	292	236	230	230
query7	4323	304	317	304
query8	259	254	241	241
query9	3127	2711	2649	2649
query10	464	315	302	302
query11	15504	15086	15339	15086
query12	131	79	80	79
query13	1033	451	429	429
query14	17761	14328	13625	13625
query15	359	224	231	224
query16	5763	277	259	259
query17	1747	953	911	911
query18	979	328	315	315
query19	207	156	152	152
query20	102	104	109	104
query21	191	107	97	97
query22	5261	5080	4991	4991
query23	34123	33378	33532	33378
query24	6938	6330	6309	6309
query25	526	434	421	421
query26	1175	166	162	162
query27	2322	297	289	289
query28	6092	2306	2288	2288
query29	2908	2726	2740	2726
query30	243	167	168	167
query31	926	736	766	736
query32	71	66	62	62
query33	449	276	261	261
query34	850	469	479	469
query35	1120	901	930	901
query36	1296	1219	1125	1125
query37	94	60	60	60
query38	3101	2959	2899	2899
query39	1376	1321	1323	1321
query40	270	97	95	95
query41	40	38	37	37
query42	91	84	83	83
query43	572	689	631	631
query44	1170	730	722	722
query45	245	225	231	225
query46	1222	972	947	947
query47	2146	1657	1675	1657
query48	525	410	427	410
query49	642	368	374	368
query50	852	643	645	643
query51	4730	4677	4632	4632
query52	105	86	89	86
query53	221	176	192	176
query54	2642	2489	2504	2489
query55	87	84	82	82
query56	222	202	212	202
query57	1373	1146	1147	1146
query58	220	204	212	204
query59	3743	3248	3119	3119
query60	225	199	208	199
query61	97	96	93	93
query62	799	471	459	459
query63	207	176	170	170
query64	3442	1580	1443	1443
query65	3629	3519	3542	3519
query66	805	417	402	402
query67	15530	15655	15820	15655
query68	9041	643	657	643
query69	499	279	278	278
query70	1495	1409	1334	1334
query71	404	314	310	310
query72	6898	4880	4712	4712
query73	759	321	323	321
query74	6256	5878	5846	5846
query75	4559	3798	3610	3610
query76	4713	1138	1200	1138
query77	617	256	256	256
query78	12465	11803	11836	11803
query79	8077	640	632	632
query80	2048	395	374	374
query81	497	246	234	234
query82	1638	102	98	98
query83	172	133	135	133
query84	254	73	69	69
query85	1016	315	319	315
query86	346	295	317	295
query87	3225	3066	3022	3022
query88	4840	2324	2340	2324
query89	413	320	291	291
query90	1876	214	219	214
query91	168	141	125	125
query92	65	52	51	51
query93	6125	609	549	549
query94	800	212	206	206
query95	1891	1981	1922	1922
query96	657	340	325	325
query97	6490	6348	6347	6347
query98	230	210	207	207
query99	2692	809	943	809
Total cold run time: 315661 ms
Total hot run time: 212972 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f96eaa9e0f4507e0bcfd14c320c8984c5c943614, data reload: false

query1	0.02	0.03	0.02
query2	0.07	0.03	0.03
query3	0.24	0.05	0.06
query4	1.77	0.07	0.07
query5	0.53	0.53	0.53
query6	1.22	0.66	0.61
query7	0.02	0.01	0.01
query8	0.04	0.03	0.02
query9	0.52	0.48	0.49
query10	0.54	0.54	0.51
query11	0.13	0.08	0.09
query12	0.13	0.10	0.10
query13	0.63	0.63	0.60
query14	0.78	0.79	0.78
query15	0.78	0.76	0.77
query16	0.36	0.36	0.39
query17	1.03	1.02	1.00
query18	0.22	0.26	0.23
query19	1.95	1.84	1.84
query20	0.01	0.02	0.01
query21	15.46	0.55	0.56
query22	2.09	2.53	1.49
query23	17.49	1.17	0.91
query24	5.75	1.62	1.46
query25	0.35	0.11	0.06
query26	0.70	0.15	0.18
query27	0.04	0.04	0.03
query28	6.33	0.75	0.70
query29	12.76	2.29	2.05
query30	0.58	0.56	0.58
query31	2.81	0.38	0.37
query32	3.36	0.49	0.51
query33	3.08	3.10	3.07
query34	15.26	4.81	4.81
query35	4.83	4.85	4.80
query36	1.05	1.02	1.01
query37	0.06	0.04	0.04
query38	0.03	0.02	0.02
query39	0.02	0.02	0.01
query40	0.16	0.14	0.14
query41	0.07	0.01	0.01
query42	0.02	0.01	0.02
query43	0.03	0.02	0.01
Total cold run time: 103.32 s
Total hot run time: 30.84 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit f96eaa9e0f4507e0bcfd14c320c8984c5c943614 with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       21.7 seconds inserted 10000000 Rows, about 460K ops/s

@yiguolei yiguolei changed the title [fix](sort) fix coredump by uncaught exception [fix](sort) fix coredump by uncaught exception。 DO NOT MERGE Jan 14, 2025
@yiguolei yiguolei closed this Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants