Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branch-2.0: [Bug](fix) Fix QueryStatistics thread-unsafe #46980

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

xinyiZzz
Copy link
Contributor

What problem does this PR solve?

fix:

*** Query id: fe05209ae622498b-9a2710f32b3dbcac ***
*** tablet id: 0 ***
*** Aborted at 1736818365 (unix time) try "date -d @1736818365" if you are using GNU date ***
*** Current BE git commitID: f129a5c1278 ***
*** SIGSEGV address not mapped to object (@0x35) received by PID 2827018 (TID 2827438 OR 0xffff00d47f40) from PID 53; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_enterprise/doris/be/src/common/signal_handler.h:417
 1# os::Linux::chained_handler(int, siginfo_t*, void*) in /software/selectdb_doris/java8/jre/lib/aarch64/server/libjvm.so
 2# JVM_handle_linux_signal in /software/selectdb_doris/java8/jre/lib/aarch64/server/libjvm.so
 3# signalHandler(int, siginfo_t*, void*) in /software/selectdb_doris/java8/jre/lib/aarch64/server/libjvm.so
 4# 0x0000FFFFF7FB07BC in linux-vdso.so.1
 5# doris::QueryStatisticsRecvr::insert(std::shared_ptr<doris::QueryStatistics>, int) at /home/zcp/repo_center/doris_enterprise/doris/be/src/runtime/query_statistics.cpp:128
 6# doris::vectorized::VDataStreamRecvr::remove_sender(int, int, std::shared_ptr<doris::QueryStatistics>) at /home/zcp/repo_center/doris_enterprise/doris/be/src/vec/runtime/vdata_stream_recvr.cpp:416
 7# doris::vectorized::Channel::send_local_block(bool, doris::Status const&) at /home/zcp/repo_center/doris_enterprise/doris/be/src/vec/sink/vdata_stream_sender.cpp:138
 8# doris::vectorized::PipChannel::send_current_block(bool, doris::Status const&) at /home/zcp/repo_center/doris_enterprise/doris/be/src/vec/sink/vdata_stream_sender.h:496
 9# doris::vectorized::Channel::close_internal(doris::Status const&) at /home/zcp/repo_center/doris_enterprise/doris/be/src/vec/sink/vdata_stream_sender.cpp:287
10# doris::vectorized::Channel::close(doris::RuntimeState*, doris::Status const&) at /home/zcp/repo_center/doris_enterprise/doris/be/src/vec/sink/vdata_stream_sender.cpp:317
11# doris::vectorized::VDataStreamSender::try_close(doris::RuntimeState*, doris::Status) at /home/zcp/repo_center/doris_enterprise/doris/be/src/vec/sink/vdata_stream_sender.cpp:707
12# doris::pipeline::DataSinkOperator<doris::pipeline::ExchangeSinkOperatorBuilder>::try_close(doris::RuntimeState*) at /home/zcp/repo_center/doris_enterprise/doris/be/src/pipeline/exec/operator.h:293
13# doris::pipeline::PipelineTask::try_close() at /home/zcp/repo_center/doris_enterprise/doris/be/src/pipeline/pipeline_task.cpp:315
14# doris::pipeline::TaskScheduler::_try_close_task(doris::pipeline::PipelineTask*, doris::pipeline::PipelineTaskState) at /home/zcp/repo_center/doris_enterprise/doris/be/src/pipeline/task_scheduler.cpp:342
15# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /home/zcp/repo_center/doris_enterprise/doris/be/src/pipeline/task_scheduler.cpp:298
16# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_enterprise/doris/be/src/util/threadpool.cpp:541
17# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_enterprise/doris/be/src/util/thread.cpp:499
18# 0x0000FFFFF7EC87A0 in /usr/lib64/libpthread.so.0
19# 0x0000FFFFF7D2BCBC in /usr/lib64/libc.so.6

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 14, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@xinyiZzz
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.57% (8314/21555)
Line Coverage: 30.27% (68778/227216)
Region Coverage: 29.70% (35420/119278)
Branch Coverage: 25.45% (18206/71530)
Coverage Report: http://coverage.selectdb-in.cc/coverage/cd1836b90adc192a3b38a01770b61403263888d6_cd1836b90adc192a3b38a01770b61403263888d6/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 48961 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cd1836b90adc192a3b38a01770b61403263888d6, data reload: false

------ Round 1 ----------------------------------
q1	17824	4415	4324	4324
q2	2050	152	144	144
q3	10461	1899	1885	1885
q4	10348	1234	1289	1234
q5	8352	3823	3907	3823
q6	231	123	125	123
q7	2018	1620	1610	1610
q8	9283	2702	2681	2681
q9	10300	9848	9800	9800
q10	8614	3516	3506	3506
q11	422	247	255	247
q12	471	290	301	290
q13	18350	3967	4007	3967
q14	356	330	333	330
q15	505	457	455	455
q16	530	451	458	451
q17	1118	953	914	914
q18	7210	6979	6921	6921
q19	1680	1559	1487	1487
q20	540	303	293	293
q21	4480	4082	4093	4082
q22	512	394	403	394
Total cold run time: 115655 ms
Total hot run time: 48961 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4303	4285	4270	4270
q2	324	225	219	219
q3	4156	4165	4124	4124
q4	2741	2731	2735	2731
q5	7122	7109	7057	7057
q6	241	121	118	118
q7	3250	2832	2823	2823
q8	4336	4460	4481	4460
q9	13614	13524	13648	13524
q10	4277	4223	4251	4223
q11	749	704	683	683
q12	1019	875	872	872
q13	7622	3770	3771	3770
q14	461	429	430	429
q15	495	461	458	458
q16	633	596	607	596
q17	3819	3791	3874	3791
q18	8837	8665	8729	8665
q19	1731	1599	1630	1599
q20	2375	2110	2110	2110
q21	8528	8463	8382	8382
q22	1011	953	875	875
Total cold run time: 81644 ms
Total hot run time: 75779 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 212418 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cd1836b90adc192a3b38a01770b61403263888d6, data reload: false

query1	948	432	382	382
query2	6539	2209	2269	2209
query3	6919	204	204	204
query4	23894	21688	21498	21498
query5	19751	6464	6478	6464
query6	282	215	247	215
query7	4158	302	310	302
query8	275	246	234	234
query9	3061	2657	2575	2575
query10	419	314	312	312
query11	15555	14884	14916	14884
query12	129	74	76	74
query13	1019	429	434	429
query14	17097	13356	13710	13356
query15	387	214	232	214
query16	6154	279	262	262
query17	1753	914	906	906
query18	876	319	309	309
query19	215	164	161	161
query20	106	99	98	98
query21	190	99	97	97
query22	5108	5159	4870	4870
query23	34130	33306	33279	33279
query24	7007	6288	6282	6282
query25	535	426	424	424
query26	823	161	161	161
query27	2352	291	294	291
query28	6106	2253	2244	2244
query29	2850	2761	2723	2723
query30	249	165	163	163
query31	983	745	781	745
query32	75	63	60	60
query33	432	265	271	265
query34	882	512	481	481
query35	1081	905	926	905
query36	1183	1245	1356	1245
query37	93	61	62	61
query38	3078	2957	2984	2957
query39	1373	1345	1328	1328
query40	213	100	97	97
query41	41	37	38	37
query42	86	85	91	85
query43	602	603	585	585
query44	1225	702	714	702
query45	246	228	231	228
query46	1233	986	963	963
query47	1945	1710	1806	1710
query48	513	410	404	404
query49	606	363	377	363
query50	848	588	622	588
query51	4754	4762	4694	4694
query52	88	79	84	79
query53	230	182	195	182
query54	2658	2470	2481	2470
query55	87	87	87	87
query56	232	223	219	219
query57	1218	1173	1136	1136
query58	224	208	214	208
query59	3505	3313	3312	3312
query60	215	207	206	206
query61	96	94	97	94
query62	829	448	504	448
query63	200	178	170	170
query64	3292	1588	1419	1419
query65	3611	3544	3529	3529
query66	752	436	422	422
query67	16105	16341	15551	15551
query68	8981	627	646	627
query69	493	276	275	275
query70	1543	1529	1385	1385
query71	371	302	315	302
query72	6844	4783	4299	4299
query73	780	319	316	316
query74	6287	5770	5792	5770
query75	4659	3737	3743	3737
query76	4572	1139	1218	1139
query77	560	262	258	258
query78	12488	12016	20767	12016
query79	5276	622	637	622
query80	1137	391	388	388
query81	492	236	235	235
query82	273	99	101	99
query83	168	138	132	132
query84	254	70	71	70
query85	941	318	324	318
query86	352	288	307	288
query87	3251	3021	3011	3011
query88	3845	2295	2304	2295
query89	348	286	272	272
query90	1831	214	219	214
query91	165	124	134	124
query92	63	53	52	52
query93	986	580	560	560
query94	758	211	215	211
query95	2015	1991	2019	1991
query96	640	319	326	319
query97	6478	6344	6401	6344
query98	242	198	202	198
query99	2948	920	894	894
Total cold run time: 303762 ms
Total hot run time: 212418 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.45 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cd1836b90adc192a3b38a01770b61403263888d6, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.02
query3	0.25	0.04	0.04
query4	1.78	0.07	0.06
query5	0.53	0.53	0.53
query6	1.26	0.62	0.62
query7	0.01	0.00	0.01
query8	0.03	0.03	0.02
query9	0.52	0.50	0.49
query10	0.53	0.56	0.55
query11	0.12	0.09	0.09
query12	0.13	0.09	0.10
query13	0.63	0.62	0.60
query14	0.79	0.79	0.78
query15	0.80	0.79	0.77
query16	0.38	0.37	0.38
query17	1.04	1.01	1.01
query18	0.23	0.25	0.24
query19	1.95	1.86	1.77
query20	0.02	0.01	0.01
query21	15.44	0.55	0.57
query22	2.41	2.81	1.67
query23	16.45	1.00	0.88
query24	5.57	1.80	1.55
query25	0.36	0.16	0.04
query26	0.63	0.16	0.15
query27	0.05	0.05	0.04
query28	6.78	0.76	0.71
query29	12.73	2.33	2.41
query30	0.56	0.54	0.55
query31	2.81	0.40	0.39
query32	3.33	0.53	0.51
query33	3.24	3.15	3.10
query34	15.26	4.89	4.79
query35	4.89	4.87	4.85
query36	1.07	1.02	1.01
query37	0.06	0.04	0.05
query38	0.05	0.02	0.03
query39	0.02	0.02	0.01
query40	0.17	0.14	0.14
query41	0.07	0.02	0.01
query42	0.02	0.02	0.01
query43	0.03	0.02	0.02
Total cold run time: 103.1 s
Total hot run time: 31.45 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit cd1836b90adc192a3b38a01770b61403263888d6 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      33 seconds loaded 861443392 Bytes, about 24 MB/s
Insert into select:       22.2 seconds inserted 10000000 Rows, about 450K ops/s

@xinyiZzz xinyiZzz changed the title branch-3.0: [Bug](fix) Fix QueryStatistics thread-unsafe branch-2.0: [Bug](fix) Fix QueryStatistics thread-unsafe Jan 14, 2025
@wm1581066 wm1581066 added the usercase Important user case type label label Jan 14, 2025
@yiguolei yiguolei merged commit 6701bc7 into apache:branch-2.0 Jan 14, 2025
19 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants