-
Notifications
You must be signed in to change notification settings - Fork 96
MNT-18308 - Add flag forceAsyncAclCreation #984
base: master
Are you sure you want to change the base?
Conversation
…g processed in another thread
…ransactionTime exceeds, force ACL creation to be async if flag is set to true
… turned on an flag turned off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check your formatting settings in the IDE as I can see multiple problems in the changes.
The FixedAclUpdater can be run only once across the entire cluster because it is synchronised via JobLockService, see org.alfresco.repo.domain.permissions.FixedAclUpdater#execute
Once the lock is obtained, the AclWorkProvider counts all nodes with ContentModel.ASPECT_PENDING_FIX_ACL and then gets the batches of nodes to be updated to the workers by the batch size ordering by node IDs, also tracking the min and max of node ID that where sent for processing. See org.alfresco.repo.domain.permissions.FixedAclUpdater.GetNodesWithAspectCallback
So I'm not sure how the same node can appear in two separate batches of one FixedAclUpdater.
* @param nodeId | ||
* @param asyncCall | ||
* @return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a valid JavaDoc. Please remove these lines.
Why is it requited to introduce a new system.fixedACLsUpdater.forceAsyncAclCreation flag?
|
About the job: I found the bug by chance when trying out the job and replicated it by running the already existing unit-test (org.alfresco.repo.domain.permissions.FixedAclUpdaterTest) and despite it turning out successful I kept getting null pointer exceptions in the log, like this:
I didn't notice the callback setting the min value of node id until you pointing it out. Currently it gets the nodes in DESC order, iterates through them and keeps re-setting the maxNodeId with the current node id value. The id with the lowest number is LAST to be processed, so that's the one that sticks as the maxNodeId and we have the problem A better solution to what I've done would in fact would be resetting the maxNodeId only when the current value is above the previously set maxNodeId |
@killerboot the flag is there and set as false by default to keep existing functionality: if we delegate all ACL creations that take more than 10s to the job, without customers having a say it it I believe it is kinda dangerous, thats why I put in the flag: only who want this and know the consequences (more critical one is ACLs will not be enforced when using search on that folder and children until job runs) can set it. All pre-existing funcionality is kept if the flag is off: currently is only delegates to job if call is async AND node has children. Please note that the manage permissions calls a JS webscript is sync. If we set the flag to true, in practical terms: even if the call is SYNC (so like when a user manages permissions through Share), if the time is exceeded, we will delegate all ACL creation of that tree to the job, even nodes without children. |
77f3602
to
db71d2c
Compare
@evasques Thanks for clarifications. |
@killerboot If I understood correctly i have to point two possible issues on this:
I know the disadvantage of my approach is that an admin would have to restart the server for this. But the other side setting this would mean that: he knows the impact of doing so and will tweak the other configs also for this to make sense in his environment (like adjusting the fixedAclMaxTransactionTime for a larger number than the default 10s or change when the job runs) Some systems can live with 10-30s transactions and they won't bat an eye if they know its a large folder and will expect acls created after that time. But it is bad if we frequently have trees with 6M nodes inside them and need to set ACLs for the first time. I don't know the average transaction time on that case, but it should be minutes, not seconds and those are the ones we want to block from happening synchronously. I think Alfresco's default behavior works fine for most customers and this only makes sense it really big repos - and big repos have high performance machines. In 10s one system can add 1000 nodes in the transaction (locally that's my average) but on a proper machine we can probably get 5x that. Time passed is hardly a unit of measure for the transaction size so telling that all systems will go async on ACL creation after that 10s mark means really different things depending on the system. |
@evasques regarding the issues you pointed out:
If the administrator decides that long transactions are fine for the system, the configuration can be changed for bigger limits, but this may result in indexing issues in Search Services. |
|
|
@killerboot |
If time fixedAclMaxTransactionTime exceeds, force ACL creation to be async if flag is set to true.
I did create a new method isSubjectToAsyncAclCreation to try to improve code legibility and avoid an if-else hell and changed method setFixAclPending to use that method where I kept the previous logic and added the new feature also:
Old logic for reference:
The new feature's intention is, if the flag is set to true, to force the node to be processed by the job later if of the following conditions are met:
Also found that the current job (FixedAclUpdater) had a bug in it: it is multi-thread and did not control which nodes it has already picked up to process. This causes the batch class to pick up nodes twice in different threads and if the node had meanwhile already been processed in the other thread it would throw a null pointer when trying to access properties that were already unset. It did not affect the overall job behavior but it was not efficient.
In order to fix this I added a control list currentlyProcessingNodes that only adds new nodes for the next batch if they are nor currently being processed. When a new node is picked up it goes into that list and as soon as it is processed it is removed from that list - keeping the overall list size manageable with max objects being batchsize*numThreads. We do not have to keep processed nodes in that list as when they are processed, the aspect is removed and properties are unset, so they have no way of being picked up again by the job.