Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing #165 #167

Merged
merged 30 commits into from
Jan 5, 2021
Merged

Fixing #165 #167

merged 30 commits into from
Jan 5, 2021

Conversation

anjohnson
Copy link
Member

Incomplete, hoping this PR will trigger an Appveyor build...

@AppVeyorBot
Copy link

@anjohnson anjohnson linked an issue Jun 30, 2020 that may be closed by this pull request
@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.12 failed (commit 6bdc5fc043 by @anjohnson)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.16 failed (commit 448cc527ac by @anjohnson)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.18 failed (commit 6188e43175 by @anjohnson)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.19 failed (commit d5cd826d62 by @anjohnson)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.20 failed (commit b81c4dc135 by @anjohnson)

@anjohnson anjohnson marked this pull request as ready for review December 1, 2020 23:49
@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.21 failed (commit cbf4c97ff8 by @anjohnson)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.22 failed (commit b42fd91bae by @anjohnson)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.23 failed (commit 31cab43e42 by @anjohnson)

src/ca/caContext.h Outdated Show resolved Hide resolved
src/ca/notifierConveyor.cpp Outdated Show resolved Hide resolved
@mdavidsaver
Copy link
Member

@anjohnson fyi. If you rebase this branch you should (for the 7.0 builds) have automatic analysis of any tests which crash.

Loop the tests so we check for destroying the caProvider.
Move the shutdown call inside the try/catch.
Moved duplicated EXCEPTION_GUARD macro to private header file.
Removed all DEBUG_LEVEL and similar prints.
Some reformatting and code simplification.
This passes the tests (on MacOS).
Channel connection notifications are now handled by connectNotifier,
getDone, putDone and monitor events handled by resultNotifier.
A notifierConveyor is generic, and contains a queue and a thread.
You pass a Notification pointing to a NotifierClient to a Conveyor's
notifyClient() method, and the thread will call the client's
notifyClient() method once it reaches the front of the queue.

The conveyor threads stop when the caProvider is destroyed.
The queue stores weak pointers, so queued notifications won't prevent
client objects from being destroyed.
Each instance of the caContext class represents a separate CA context,
so each CAChannelProvider creates one and keeps a shared_ptr to it,
making that available to its channels and channel operations. These
also take their own shared_ptr to it as well so the context cannot be
destroyed while it might be needed.

A related caContext Attach object is intended to be short-lived, and
to be allocated on the stack. When created it saves the current CA
context for the thread, replacing it from the caContext given to its
constructor. CA operations will now use the attached context. When the
Attach destructor runs it detaches the thread from the current context
(checking still has the expected value) and re-attaches the thread to
any context that was saved by the constructor.
@anjohnson
Copy link
Member Author

Assuming the Appveyor builds don't fail, I think this is now done. I added a separate test program for the notifyConveyor, and restructured the run() method as you suggested; I gave up trying to allow a client to be able to delete the conveyor in its notifyClient() method, if a client tries that now the notifier thread calls cantProceed().

This PR should also fix #163 and #83.

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.28 failed (commit 512c477b0e by @anjohnson)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.29 failed (commit 20f60679e3 by @anjohnson)

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.30 failed (commit 3342b91076 by @anjohnson)

@mdavidsaver
Copy link
Member

The 3.15 DLL builds fail with:

dbdToPv.obj : error LNK2001: unresolved external symbol dbr_value_offset
dbdToPv.obj : error LNK2001: unresolved external symbol dbf_text_dim
pvAccessCA.dll : fatal error LNK1120: 2 unresolved externals

wrt.

... fatal error LNK1318: Unexpected PDB error; OK (0) ...

I've yet to find a good explanation for what is happening. LNK1318 seems to mean something went wrong with a (the?) local debug symbol server. Given the intermittent occurrences, one "resolution" stands out to me:

  • Serialize linking to mitigate parallel link issues if needed. This error can be caused if mspdbsrv.exe is launched by one instance of link, and is shut down before another instance of link is done using it. The downside to this fix is that your project builds may take considerably longer to complete.

Maybe this could be avoided by preemptively launching a single instance? (MSVC debug symbol handling is so complicated...)

@AppVeyorBot
Copy link

@mdavidsaver
Copy link
Member

mdavidsaver commented Jan 4, 2021

The appveyor build passed once. I've restarted it, and will likely do so a couple of more times, in an attempt to provoke a hang. If not, then I'll merge.

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.32 failed (commit 61a224643b by @anjohnson)

@anjohnson
Copy link
Member Author

This build failure occurred while linking testGetPerformance.exe:

[00:08:01] make[2]: Entering directory 'C:/projects/pvaccesscpp/testApp/O.windows-x64-static'
[00:08:01] link -nologo  -debug -incremental:no -fixed:no -MACHINE:X64             -out:testGetPerformance.exe   testGetPerformance.obj      ../../lib/windows-x64-static/pvAccess.lib  C:/Users/appveyor/.cache/pvdata-master/lib/windows-x64-static/pvData.lib  C:/Users/appveyor/.cache/base-7.0/lib/windows-x64-static/Com.lib netapi32.lib netapi32.lib ws2_32.lib ws2_32.lib advapi32.lib user32.lib kernel32.lib winmm.lib dbghelp.lib     
[00:08:01] pvData.lib(createRequest.obj) : warning LNK4204: 'C:\projects\pvaccesscpp\testApp\O.windows-x64-static\vc140.pdb' is missing debugging information for referencing module; linking object as if no debug info
...
[00:08:01] Com.lib(osdSockAddrReuse.obj) : warning LNK4204: 'C:\projects\pvaccesscpp\testApp\O.windows-x64-static\vc140.pdb' is missing debugging information for referencing module; linking object as if no debug info
[00:08:01] Com.lib(epicsString.obj) : fatal error LNK1318: Unexpected PDB error; OK (0) 'C:\projects\pvaccesscpp\testApp\O.windows-x64-static\vc140.pdb'
[00:08:01] make[2]: *** [C:/Users/appveyor/.cache/base-7.0/configure/RULES_BUILD:212: testGetPerformance.exe] Error 1318
[00:08:01] make[2]: Leaving directory 'C:/projects/pvaccesscpp/testApp/O.windows-x64-static'

In an earlier failed build the same random error has happened while linking testPVUnion.exe which is in pvData, so I'm not convinced this has anything to do with caProvider as such. We could try using the -Z7 compiler flag instead of -Zi which embeds the debugging symbols into the object files instead of using the separate shared database, but that flag belongs in base/configure/os/CONFIG.win32-x86.win32-x86 and not in the pvAccess module at all. This might solve both the LNK4204 and LNK1318 messages.

Thanks for re-triggering the build just now, I still don't have the ability to manage/configure the projects in the epics-base account.

@mdavidsaver
Copy link
Member

I'm not convinced this has anything to do with caProvider as such.

Well I am convinced that it has nothing whatsoever to do with caProvider. I see this error occasionally with other projects. It just seems to happen more often with pvAccessCPP. Go figure. Another CI nuisance which will eventually be sorted out.

@anjohnson
Copy link
Member Author

We appear to have switched between -Zi and -Z7 and back at least once in the past, so maybe it wouldn't help.

@mdavidsaver
Copy link
Member

If you are looking for something to try, /FS might be interesting. In truth though, this particular issue doesn't seem so significant to me. It's easy to identify as a build "failure". I'll be more interested to get some details about whatever makes tests ~randomly crash on exit.

@AppVeyorBot
Copy link

Build pvAccessCPP 1.0.33 failed (commit 61a224643b by @anjohnson)

@AppVeyorBot
Copy link

@mdavidsaver
Copy link
Member

Four runs with no hang. That's good enough for now.

@mdavidsaver mdavidsaver merged commit 4638c11 into epics-base:master Jan 5, 2021
@anjohnson anjohnson deleted the fix-165 branch January 5, 2021 19:46
@anjohnson
Copy link
Member Author

Thanks.

We already use -FS on the compiler command-line for debug builds, which is where the failures have been happening:

cl -EHsc -GR -DNOMINMAX -nologo -FC -D__STDC__=0 -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -RTCsu -Zi -FS -W3 -w44355 -w44344 -w44251 -D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING -MTd -TP ...

The error is coming from the linker and I don't see any relevant flags for that.

@mdavidsaver
Copy link
Member

fatal error LNK1318: Unexpected PDB error; OK (0)

I've been paying more attention to these failures in the past week. I've only seen this error with VS 2017 builds. Not 2015 nor 2019. Maybe this really is a toolchain bug?

@anjohnson
Copy link
Member Author

Possibly, this might confirm that. They all seem to be 2017 static debug builds; maybe we should just exclude that particular combination?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

testCaProvider deadlock
3 participants