You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today rmdupe is hardcoded to use stat -c%Z for sorting files (curage and otherage variables). But that gives us a timestamp that is changed even when a file is just renamed.
AFAIK, not even touch can change the "last status change" timestamp, only access and modification timestamps. But even moving, renaming or rsyncing data (with rsync -av) from a media to another changes such status, so I'd argue that information doesn't really show which file is older. A copy would be older than a renamed/moved original.
Here I changed these 2 lines to use stat -c%Y instead. That gets the last data modification timestamp, which I think that should be the default, or at least the first thing to be seen in such a comparison. Copying with cp doesn't copy the data modification timestamp, it uses the copying time unless its --preserve parameter is used. That timestamp recognizes what are really duplicate copies, and a touch -m allow me to control things a little bit.
I think there should be more than one comparison level for finding which file should be kept when duplicates are found. Also, there should be an option to change the criterion in every comparison level. "Older" would be just the default criterion (BTW, that would keep the rmdupe --help description). The comparison order might be different for every level, so I think that the --old should be --inv with the description "inverts the ascending/descending order for every comparison criterion (remove oldest duplicates instead of newest when used with the defaults)".
If the "data modification" timestamps are identical, there should be a second level comparison. On such a level, the "status change" timestamp makes some sense, but I still think using other stuff like the file name would be better. The last comparison levels can get weird information like the inode number followed by the device id, just to avoid randomness. Nevertheless, these n-th level comparison would be calling:
stat -c%n file name
stat -c%u file owner ID
stat -c%U file owner name
stat -c%g file group ID
stat -c%G file group name
stat -c%i inode number
stat -c%d device number
stat -c%W file birth timestamp (is this one useful for anyone?)
stat -c%X last access timestamp
stat -c%Y last modification timestamp
stat -c%Z last status change timestamp
These are the ones that IMHO makes sense as some comparison level. There are other criteria that makes sense (e.g. the name lengths), but most cases that needs a specific comparison scheme don't need anything beyond the criteria above. The only thing that is still missing is whether the comparison should keep the file with the smallest or biggest value on each criterion. A solution would be (1) sorting, (2) keeping only the first on the sorting result, and (3) allowing a "reversed sorting" for each criterion by using an extra suffix.
The stat format parameter that changes above has only a single char in nuUgGidWXYZ, and for choosing the criteria order these would be enough. As every criterion could be reversed, there's 2 solutions I thought: (1) using r as a suffix (luckily, %r and %R aren't stat format parameters), or (2) using a or d to denote ascending or descending order for sorting, telling the user that the first file after sorting is the only one kept. On (2) every criterion would need exact 2 chars and "d" has 2 different meanings, on (1) the criterion can have 1 or 2 chars. Examples with both ideas:
# Last modification, name, inode, device
rmdupe --sort Ynid [...]
rmdupe --sort Yanaiada [...]
# Last modification, name reversed, inode, device
rmdupe --sort Ynrid [...]
rmdupe --sort Yandiada [...]
I prefer (1) for the parameter. I think that Ynid should be the default criteria. Also, the id/iada should be the parameter suffix no matter what was given (i.e., inode and device numbers as last criteria just to avoid randomness). The --sort is just an idea that makes sense for me as I would be thinking to get the head -n1 from the sort command result, with sort -r as the reverse, and an implicit sort -n for numbers. Yet the comparison on each ASCII criterion (file/owner/group name) would be [[ name1 < name2 ]] instead of (( curage < otherage )), there's no need to call sort.
The text was updated successfully, but these errors were encountered:
Today rmdupe is hardcoded to use
stat -c%Z
for sorting files (curage
andotherage
variables). But that gives us a timestamp that is changed even when a file is just renamed.AFAIK, not even
touch
can change the "last status change" timestamp, only access and modification timestamps. But even moving, renaming or rsyncing data (withrsync -av
) from a media to another changes such status, so I'd argue that information doesn't really show which file is older. A copy would be older than a renamed/moved original.Here I changed these 2 lines to use
stat -c%Y
instead. That gets the last data modification timestamp, which I think that should be the default, or at least the first thing to be seen in such a comparison. Copying withcp
doesn't copy the data modification timestamp, it uses the copying time unless its--preserve
parameter is used. That timestamp recognizes what are really duplicate copies, and atouch -m
allow me to control things a little bit.I think there should be more than one comparison level for finding which file should be kept when duplicates are found. Also, there should be an option to change the criterion in every comparison level. "Older" would be just the default criterion (BTW, that would keep the
rmdupe --help
description). The comparison order might be different for every level, so I think that the--old
should be--inv
with the description "inverts the ascending/descending order for every comparison criterion (remove oldest duplicates instead of newest when used with the defaults)".If the "data modification" timestamps are identical, there should be a second level comparison. On such a level, the "status change" timestamp makes some sense, but I still think using other stuff like the file name would be better. The last comparison levels can get weird information like the inode number followed by the device id, just to avoid randomness. Nevertheless, these n-th level comparison would be calling:
stat -c%n
file namestat -c%u
file owner IDstat -c%U
file owner namestat -c%g
file group IDstat -c%G
file group namestat -c%i
inode numberstat -c%d
device numberstat -c%W
file birth timestamp (is this one useful for anyone?)stat -c%X
last access timestampstat -c%Y
last modification timestampstat -c%Z
last status change timestampThese are the ones that IMHO makes sense as some comparison level. There are other criteria that makes sense (e.g. the name lengths), but most cases that needs a specific comparison scheme don't need anything beyond the criteria above. The only thing that is still missing is whether the comparison should keep the file with the smallest or biggest value on each criterion. A solution would be (1) sorting, (2) keeping only the first on the sorting result, and (3) allowing a "reversed sorting" for each criterion by using an extra suffix.
The stat format parameter that changes above has only a single char in
nuUgGidWXYZ
, and for choosing the criteria order these would be enough. As every criterion could be reversed, there's 2 solutions I thought: (1) usingr
as a suffix (luckily,%r
and%R
aren't stat format parameters), or (2) usinga
ord
to denote ascending or descending order for sorting, telling the user that the first file after sorting is the only one kept. On (2) every criterion would need exact 2 chars and "d" has 2 different meanings, on (1) the criterion can have 1 or 2 chars. Examples with both ideas:I prefer (1) for the parameter. I think that
Ynid
should be the default criteria. Also, theid
/iada
should be the parameter suffix no matter what was given (i.e., inode and device numbers as last criteria just to avoid randomness). The--sort
is just an idea that makes sense for me as I would be thinking to get thehead -n1
from thesort
command result, withsort -r
as the reverse, and an implicitsort -n
for numbers. Yet the comparison on each ASCII criterion (file/owner/group name) would be[[ name1 < name2 ]]
instead of(( curage < otherage ))
, there's no need to callsort
.The text was updated successfully, but these errors were encountered: