Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JBPM-10197] Handle TimerMappingInfo in error scenarios #2335

Closed
wants to merge 2 commits into from

Conversation

fjtirado
Copy link
Contributor

@fjtirado fjtirado commented Sep 15, 2023

Thank you for submitting this pull request

JIRA: (please edit the JIRA link if it exists)

link

@fjtirado fjtirado force-pushed the JBPM-10197 branch 2 times, most recently from 476aa4c to f74a8e3 Compare September 15, 2023 13:12
@fjtirado fjtirado marked this pull request as draft September 15, 2023 13:12
// this situation needs to be avoided as it should not happen
if (TRANSACTIONAL) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can lead to duplicate timers, eg. during process instance migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed transactional check. If ejbTimer is null, it is impossible to remove the job, so the method should return false regardless the transactional status, isnt it?

@martinweiler
Copy link
Contributor

@fjtirado I tested this PR with the following use case:

  1. Create process with timer, using postgres DB and oid based schema
  2. Manually delete the pg_largeobject record linked to TimerMappingInfo.info to simulate an issue with the retrieval of the info blob from the database.
  3. Invoke instances/PROCID/timers API to force reading of timers

Findings: If I remove the throw calls in getEjbTimer method, I can see that the TimerMappingInfo gets recreated, which is what I expected. However, it is missing important information, such as the external id and the info column:

Old TimerMappingInfo entry:

 id |           externaltimerid            | kiesessionid | processinstanceid | timerid |  uuid  | info  
----+--------------------------------------+--------------+-------------------+---------+--------+-------
  6 | 35c5299a-c167-4ca8-bef9-0743a2f35d7b |           13 |                 3 |       2 | 13-3-2 | 18130

New TimerMappingInfo entry:

 id | externaltimerid | kiesessionid | processinstanceid | timerid |  uuid  | info 
----+-----------------+--------------+-------------------+---------+--------+------
  7 |                 |           13 |                 3 |       2 | 13-3-2 |    

@fjtirado fjtirado force-pushed the JBPM-10197 branch 2 times, most recently from e410bd9 to b8d936a Compare October 4, 2023 09:53
@fjtirado
Copy link
Contributor Author

fjtirado commented Oct 26, 2023

@martinweiler I think the behaviour you are experiencing (empty externalid and info) is essentially the same that will occur without the patch.
I mean this patch is intended to avoid the performance bottleneck of calling getTimerByName when there is not need to do so , not the problem with empty external id and info.

@fjtirado fjtirado requested a review from martinweiler October 26, 2023 12:56
@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

56.6% 56.6% Coverage
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@fjtirado fjtirado marked this pull request as ready for review October 26, 2023 15:21
@fjtirado fjtirado added backport-7.67.x Generate backport PR for 7.67.x branch backport-7.67.x-blue Generate backport PR for 7.67.x-blue branch labels Oct 26, 2023
}
} catch (InvalidEJBTimer ex) {
logger.info(
"Avoiding searching all timers since the timer mapping info pointed to a timer that is not longer valid");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have this logged at debug level and also have the jobName added so that the message provides more context

} catch (Exception e) {
logger.warn("wast not able to deserialize info field from timer info for uuid");
logger.warn("Error {} objetining the timer handle for timermappinginfo {}", e.getMessage(),
timerMappingInfo);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obtaining (?)

@albfan
Copy link
Contributor

albfan commented Oct 27, 2023

I try Martin approach to generate the problem.

Definitely, PUT timers when there're two timers active id 1 and id 2, editing id 2 removes from database timer id 1 and id 2 and id 3 refer to second timer but as this was detected on RHPAM 7.12.1, and customer do not have RHPAM-4192 fix, I get a null listing timers again.

I cannot find another action that removes the timer from database, while keeping it in process instance memory. JBPM-10198 should fix that so if that's the only source for "corrupt" a timer that will not happen again. Then this code is defensive for any situation where timer is removed from database.

This is: If there's a timermappinginfo, but timer cannot be retrieved, there's an error so do not look for timer in table, just create another one.

In that sense, code to deal with an error should not be merged, so that's why I offer the alternative PR #2351

If search for timer is not a blocker like implementing https://issues.redhat.com/browse/EAPSUP-1336 (add a column with external id from app using EJB timers) we can ignore this too as search will be really quick.

Just to make it easy to understand:

select 
  id, 
  initial_date, 
  next_date, 
  timer_state,
  encode(decode(info, 'base64'), 'escape') as info,
  (regexp_match(encode(decode(info, 'base64'), 'escape'), '([0-9]+)-([0-9]+)-(.*-[0-9]+)')) as ids, 
  (regexp_match(encode(decode(info, 'base64'), 'escape'), '[0-9]+-([0-9]+)-.*-[0-9]+'))[1] as pid,
  (regexp_match(encode(decode(info, 'base64'), 'escape'), '[-a-z0-9]+_[0-9]+.[0-9]+.[0-9]+')) as container
from jboss_ejb_timer 

This is a query that search in milliseconds what timer is related with what timerID created in RHPAM, but is a hack, we want EJB timers to offer this as a feature

@albfan
Copy link
Contributor

albfan commented Oct 27, 2023

@mareknovotny can we agreed to over #2351 as a solution for customer with this problems instead of this defensive code?

@mareknovotny
Copy link
Member

not mine decision, @krisv @fjtirado @martinweiler @gmunozfe are reviewers

@fjtirado
Copy link
Contributor Author

Im closing this one, will open a new one with alternative approach

@fjtirado fjtirado closed this Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.67.x Generate backport PR for 7.67.x branch backport-7.67.x-blue Generate backport PR for 7.67.x-blue branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants