-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in_tail: Fix a stall bug on !follow_inode case #4327
Conversation
Currently, this fails. We need to fix this problem. Signed-off-by: Daijiro Fukuda <[email protected]>
Fix #3614 Although known stall issues of in_tail on `follow_inode` case are fixed in v1.16.2, it has still a similar problem on `!follow_inode` case. In this case, a tail watcher is possible to mark the position entry as `unwatched` if it's tansitioned to `rotate_wait` state by `refresh_watcher` even if another newer tail watcher is managing it. It's hard to occur in usual because `stat_watcher` will be called immediately after the file is changed while `refresh_wather` is called every 60 seconds by default. However, there is a rare possibility that this order might be swapped especillay if in_tail is busy on processing large amount of logs. Because in_tail is single threadied, event queues such as timers or inotify will be stucked in this case. There is no such problem on `follow_inode` case because position entries are always marked as `unwatched` before entering `rotate_wait` state. Signed-off-by: Takuro Ashie <[email protected]>
https://github.com/fluent/fluentd/actions/runs/6558310821/job/17811619099?pr=4327
|
Sometimes it's also reproduced on my local GNU/Linux environment. |
Signed-off-by: Takuro Ashie <[email protected]>
We found another issue while checking this patch. |
Thanks for this fix! Indeed, there seems to be no need to unwatch the target existing in fluentd/lib/fluent/plugin/in_tail.rb Lines 567 to 570 in cf840ad
I think we will need more fundamental fixes in the future, but currently, we need this fix to solve this issue while minimizing the impact on existing logic. |
I'll check the behavior a little more tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks!
Fix fluent#3614 Although known stall issues of in_tail on `follow_inode` case are fixed in v1.16.2, it has still a similar problem on `!follow_inode` case. In this case, a tail watcher is possible to mark the position entry as `unwatched` if it's tansitioned to `rotate_wait` state by `refresh_watcher` even if another newer tail watcher is managing it. It's hard to occur in usual because `stat_watcher` will be called immediately after the file is changed while `refresh_wather` is called every 60 seconds by default. However, there is a rare possibility that this order might be swapped especillay if in_tail is busy on processing large amount of logs. Because in_tail is single threadied, event queues such as timers or inotify will be stucked in this case. There is no such problem on `follow_inode` case because position entries are always marked as `unwatched` before entering `rotate_wait` state. --------- Signed-off-by: Takuro Ashie <[email protected]> Co-authored-by: Daijiro Fukuda <[email protected]>
For `refresh_watcher`, if an exisiting TailWatcher already follows a target path with the different inode, it means that the TailWatcher following the rotated file still exists. In this case, `refresh_watcher` can't start the new TailWatcher for the new current file. So, we should output a warning log in order to prevent silent collection stops, such as fluent#4327. The similar warning may work for follow_inodes too. Just limiting the case to suppress the impact to existing logic. Signed-off-by: Daijiro Fukuda <[email protected]>
For `refresh_watcher`, if an exisiting TailWatcher already follows a target path with the different inode, it means that the TailWatcher following the rotated file still exists. In this case, `refresh_watcher` can't start the new TailWatcher for the new current file. So, we should output a warning log in order to prevent silent collection stops, such as fluent#4327. The similar warning may work for follow_inodes too. Just limiting the case to suppress the impact to existing logic. Signed-off-by: Daijiro Fukuda <[email protected]>
For `refresh_watcher`, if an exisiting TailWatcher already follows a target path with the different inode, it means that the TailWatcher following the rotated file still exists. In this case, `refresh_watcher` can't start the new TailWatcher for the new current file. So, we should output a warning log in order to prevent silent collection stops, such as #4327. The similar warning may work for follow_inodes too. Just limiting the case to suppress the impact to existing logic. Signed-off-by: Daijiro Fukuda <[email protected]>
) For `refresh_watcher`, if an exisiting TailWatcher already follows a target path with the different inode, it means that the TailWatcher following the rotated file still exists. In this case, `refresh_watcher` can't start the new TailWatcher for the new current file. So, we should output a warning log in order to prevent silent collection stops, such as fluent#4327. The similar warning may work for follow_inodes too. Just limiting the case to suppress the impact to existing logic. Signed-off-by: Daijiro Fukuda <[email protected]>
Which issue(s) this PR fixes:
Fixes #3614
Closes #4264
What this PR does / why we need it:
Although known stall issues of in_tail on
follow_inode
case are fixedin v1.16.2, it has still a similar problem on
!follow_inode
case.In this case, a tail watcher is possible to mark the position entry as
unwatched
if it's tansitioned torotate_wait
state byrefresh_watcher
even if another newer tail watcher is managing it.It's hard to occur in usual because
stat_watcher
will be calledimmediately after the file is changed while
refresh_wather
is calledevery 60 seconds in default. However, there is a rare possibility that
this order might be swapped especillay if in_tail is busy on processing
large amount of logs. Because in_tail is single threadied, event queues
such as timers or inotify will be stucked in this case.
There is no such problem on
follow_inode
case because position entriesare always marked as
unwatched
before enteringrotate_wait
state.Docs Changes:
N/A
Release Note:
Same with the title