[Bug] Long-running file operations can fail internally #782

andrewazores · 2025-01-23T20:36:01Z

Current Behavior

Related to #698

cryostat/src/main/java/io/cryostat/recordings/RecordingHelper.java

Line 824 in fceff9d

try (var stream = getActiveInputStream(recording);

cryostat/src/main/java/io/cryostat/recordings/RecordingHelper.java

Line 977 in fceff9d

return remoteRecordingStreamFactory.open(recording);

cryostat/src/main/java/io/cryostat/recordings/RemoteRecordingInputStreamFactory.java

Line 37 in fceff9d

return connectionManager.executeConnectedTask(

cryostat/src/main/java/io/cryostat/targets/TargetConnectionManager.java

Line 207 in fceff9d

return executeConnectedTaskUni(target, task).await().atMost(failedTimeout);

Operations that are expected to potentially take a long time, in particular ones dealing with active recording JFR files (archiving, uploading to jfr-datasource, report generation), still use the default 10 second connection timeout. If the operation has not completed within this timeframe then the connection is internally failed, the job aborted, and the failure reported to the client(s). However, for large recordings, we should expect that these operations can take a much longer time to complete than 10 seconds.

The 10 second connection timeout is still useful for operations that are not expected to take very long, like listing event types or querying for a list of active recordings in the target, so I think those should still use the default timeout length (but maybe we should increase the timeout from 10 seconds to ex. 30 seconds). The particular operations like archiving the recording, or uploading it to jfr-datasource, should use a separate and significantly longer timeout period now that we have the async job notifications API.

Expected Behavior

No response

Steps To Reproduce

No response

Environment

- OS:
- Environment:
- Version:

Anything else?

No response

andrewazores added the bug Something isn't working label Jan 23, 2025

andrewazores assigned Josh-Matsuoka Jan 23, 2025

andrewazores added this to 4.0.0 release Jan 23, 2025

andrewazores moved this to Ready in 4.0.0 release Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Long-running file operations can fail internally #782

[Bug] Long-running file operations can fail internally #782

andrewazores commented Jan 23, 2025 •

edited

Loading

[Bug] Long-running file operations can fail internally #782

[Bug] Long-running file operations can fail internally #782

Comments

andrewazores commented Jan 23, 2025 • edited Loading

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

andrewazores commented Jan 23, 2025 •

edited

Loading