-
Notifications
You must be signed in to change notification settings - Fork 29
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
KB: add the article for potential risk with fstrim
- Also mentioned how to avoid this risk. Signed-off-by: Vicente Cheng <[email protected]> Co-authored-by: Kiefer Chang <[email protected]> Co-authored-by: Eric Weber <[email protected]> Co-authored-by: Jillian <[email protected]>
- Loading branch information
1 parent
27e6a70
commit 13b9f83
Showing
1 changed file
with
58 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
--- | ||
title: Mitigating fstrim Risk | ||
description: The potential risk with fstrim and how to avoid it | ||
slug: the_potential_risk_with_fstrim | ||
authors: | ||
- name: Vicente Cheng | ||
title: Senior Software Engineer | ||
url: https://github.com/Vicente-Cheng | ||
image_url: https://github.com/Vicente-Cheng.png | ||
tags: [harvester, rancher integration, longhorn, fstrim] | ||
hide_table_of_contents: false | ||
--- | ||
|
||
Using fstrim is a common way to release unused space in a filesystem. However, this utility is known to cause IO errors when used with Longhorn volumes that are rebuilding. For more information about the errors, see the following issues: | ||
|
||
- Harvester: [Issue 4793](https://github.com/harvester/harvester/issues/4739) | ||
- Longhorn: [Issue 7103](https://github.com/longhorn/longhorn/issues/7103) | ||
|
||
## Risks Associated with fstrim Usage | ||
|
||
A consequence of the IO errors caused by fstrim is that VMs using affected Longhorn volumes become stuck. Imagine the VM is running critical applications, then becomes unavailable. This is significant because Harvester typically uses Longhorn volumes as VM disks. The IO errors will cause VMs to flap between running and paused states until volume rebuilding is completed. | ||
|
||
Although the described system behavior does not affect data integrity, it might induce panic in some users. Consider the guest Kubernetes cluster scenario. In a stuck VM, the etcd service is unavailable. The effects of this failure cascade from the Kubernetes cluster becoming unavailable to services running on the cluster becoming unavailable. | ||
|
||
## Risk Mitigation | ||
|
||
One way to mitigate the described risks is to disable fstrim in VMs. fstrim is enabled by default in many modern Linux distributions. | ||
You can determine if fstrim is enabled in VMs that use affected Longhorn volumes by checking the following: | ||
|
||
- `/etc/fstab`: Some root filesystems mount with the *discard* option. | ||
|
||
Example: | ||
``` | ||
/dev/mapper/rootvg-rootlv / xfs defaults,discard 0 0 | ||
``` | ||
You can disable fstrim on the root filesystem by removing the *discard* option. | ||
``` | ||
/dev/mapper/rootvg-rootlv / xfs defaults 0 0 <-- remove the discard option | ||
``` | ||
After removing the *discard* option, you can remount the root filesystem using the command `mount -o remount /` or by rebooting the VM. | ||
- `fstrim.timer`: When this service is enabled, fstrim executes weekly by default. You can either disable the service or edit the service file to prevent simultaneous fstrim execution on VMs. | ||
You can disable the service using the following command: | ||
``` | ||
systemctl disable fstrim.timer | ||
``` | ||
To prevent simultaneous fstrim execution, use the following values in the service file (located at `/usr/lib/systemd/system/fstrim.timer`): | ||
``` | ||
[Timer] | ||
OnCalendar=weekly | ||
AccuracySec=1h | ||
Persistent=true | ||
RandomizedDelaySec=6000 | ||
``` |