Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in SDK shutdown #6827

Open
greatvovan opened this issue Oct 27, 2024 · 4 comments
Open

Race condition in SDK shutdown #6827

greatvovan opened this issue Oct 27, 2024 · 4 comments
Labels
Bug Something isn't working help wanted

Comments

@greatvovan
Copy link

Describe the bug
In some circumstances, after successful execution of an instrumented program, an exception is thrown on termination:

java.lang.NoClassDefFoundError: io/opentelemetry/sdk/common/CompletableResultCode
    at io.opentelemetry.sdk.trace.SpanProcessor.forceFlush (SpanProcessor.java:104)
    at io.opentelemetry.sdk.trace.SpanProcessor.shutdown (SpanProcessor.java:94)
    at io.opentelemetry.sdk.trace.TracerSharedState.shutdown (TracerSharedState.java:104)
    at io.opentelemetry.sdk.trace.SdkTracerProvider.shutdown (SdkTracerProvider.java:123)
    at io.opentelemetry.sdk.OpenTelemetrySdk.shutdown (OpenTelemetrySdk.java:111)
    at Example.lambda$main$0 (Example.java:21)
    at java.lang.Thread.run (Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: io.opentelemetry.sdk.common.CompletableResultCode
    at org.codehaus.mojo.exec.URLClassLoaderBuilder$ExecJavaClassLoader.loadClass (URLClassLoaderBuilder.java:198)
    at java.lang.ClassLoader.loadClass (ClassLoader.java:526)
    at io.opentelemetry.sdk.trace.SpanProcessor.forceFlush (SpanProcessor.java:104)
    at io.opentelemetry.sdk.trace.SpanProcessor.shutdown (SpanProcessor.java:94)
    at io.opentelemetry.sdk.trace.TracerSharedState.shutdown (TracerSharedState.java:104)
    at io.opentelemetry.sdk.trace.SdkTracerProvider.shutdown (SdkTracerProvider.java:123)
    at io.opentelemetry.sdk.OpenTelemetrySdk.shutdown (OpenTelemetrySdk.java:111)
    at Example.lambda$main$0 (Example.java:21)
    at java.lang.Thread.run (Thread.java:1583)
Oct 27, 2024 5:42:11 AM io.opentelemetry.sdk.OpenTelemetrySdk shutdown
INFO: Multiple shutdown calls
[WARNING] 
java.lang.NoClassDefFoundError: io/opentelemetry/sdk/common/CompletableResultCode
    at io.opentelemetry.sdk.OpenTelemetrySdk.shutdown (OpenTelemetrySdk.java:108)
    at io.opentelemetry.sdk.OpenTelemetrySdk.close (OpenTelemetrySdk.java:119)
    at java.lang.Thread.run (Thread.java:1583)
Caused by: java.lang.ClassNotFoundException: io.opentelemetry.sdk.common.CompletableResultCode
    at org.codehaus.mojo.exec.URLClassLoaderBuilder$ExecJavaClassLoader.loadClass (URLClassLoaderBuilder.java:198)
    at java.lang.ClassLoader.loadClass (ClassLoader.java:526)
    at io.opentelemetry.sdk.OpenTelemetrySdk.shutdown (OpenTelemetrySdk.java:108)
    at io.opentelemetry.sdk.OpenTelemetrySdk.close (OpenTelemetrySdk.java:119)
    at java.lang.Thread.run (Thread.java:1583)

Steps to reproduce
I am not sure how reliable my repro, but I hope maybe Docker makes it more stable.
https://github.com/greatvovan/otel-log4j-file-config/tree/bug/class-not-found

% docker run -it --rm -v $(pwd)/otel-log4j-file-config:/prj maven bash
# cd /prj
# mvn compile
# export OTEL_LOGS_EXPORTER=none OTEL_METRICS_EXPORTER=none OTEL_TRACES_EXPORTER=none
# mvn exec:java

This reliably throws exception every time for me.

Originally I discovered the error in the following way. I have this example for Log4j Appender:
https://github.com/greatvovan/otel-log4j-file-config/tree/bug/repro-runtime-hook
(Note that OTEL_LOGS_EXPORTER is set to console for this one).

It works fine, but if I comment the rows for OpenTelemetryAppender in log4j2.xml, the error appears. As from the semantics perspective it makes absolutely no sense, I conclude that it may be related to the order or timing of how the library cleans up the resources on program exit.

If I change the shutdown hook to a synchronous call to openTelemetry.shutdown(); it solves the problem but it posts

INFO: Multiple shutdown calls

I think it is the least of the evils, as it's either an info message or nothing.

What did you expect to see?
No weird exceptions.

What did you see instead?
A confusing exception.

What version and what artifacts are you using?
Artifacts: sdk, Console Exporter
Version: (e.g., 1.43.0 or bom 2.9.0)
How did you reference these artifacts? pom.xml

Environment
Compiler: Temurin-21.0.4+7 (build 21.0.4+7-LTS)
OS: Ubuntu 24.04.1 LTS
Runtime (if different from JDK above): "21.0.4" 2024-07-16 LTS

Additional context
I found the exception harmless, but it is very confusing for the user to see it, because it leads to internals of the library and makes one think something is wrong with instrumentation. Also it complains on the class that 100% exists in the class path (the line import io.opentelemetry.sdk.common.CompletableResultCode succeeds in run time), which can make user think some dependencies are missing while it is not like that.

@greatvovan greatvovan added the Bug Something isn't working label Oct 27, 2024
@trask
Copy link
Member

trask commented Oct 27, 2024

thanks for posting the repro!

I believe this is a limitation to running the shutdown hook inside of the maven process, I think we've seen the same issue here: open-telemetry/opentelemetry-java-contrib#990

and I can reproduce it without even loading the SDK with just:

public class Example {
    public static void main(String[] args) {
        // uncomment this line to make is pass
        // Class.forName("io.opentelemetry.sdk.common.CompletableResultCode");
        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            try {
                Class.forName("io.opentelemetry.sdk.common.CompletableResultCode");
            } catch (ClassNotFoundException e) {
                throw new RuntimeException(e);
            }
        }));
    }
}

and running mvn compile exec:java gives the same error you see

while mvn exec:exec -Dexec.executable=java -Dexec.args="-cp %classpath Example" succeeds because it executes in a separate process (outside of the maven process)

given that this has been reported a couple of times, it may be worth catching ClassNotFoundException inside the SDK shutdown hook and checking if we're running inside of maven, in which case we could log a more friendly warning about the flush on shutdown failing in this scenario

cc @cyrille-leclerc

@jack-berg
Copy link
Member

it may be worth catching ClassNotFoundException inside the SDK shutdown hook and checking if we're running inside of maven, in which case we could log a more friendly warning about the flush on shutdown failing in this scenario

I'd welcome a PR with this proposal

@breedx-splk
Copy link
Contributor

@trask do you have a trick for detecting if we're running inside of maven? My instinct is to check for a class, but if the classpath is being torn down can that be reliable? Some other way?

@trask
Copy link
Member

trask commented Dec 13, 2024

My instinct is to check for a class, but if the classpath is being torn down can that be reliable?

you could check for a maven class in static init (at startup) and store into static boolean that could be referenced from shutdown hook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working help wanted
Projects
None yet
Development

No branches or pull requests

4 participants