Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file difference check function #202

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

Gompei
Copy link

@Gompei Gompei commented Nov 27, 2021

Hello:wave:
This PR implements a function to check the difference between source and destination files based on the files present in the source.
With this function, the following process is possible.

syncManager := s3sync.New(sess)

hasDiff, err := syncManager.HasDifference(os.Args[1], os.Args[2])
if err != nil {
	panic(err)
}

if hasDiff {
        // It will only be able to run when there are differences in the files.
	err = syncManager.Sync(os.Args[1], os.Args[2])
	if err != nil {
		panic(err)
	}
} else {
	fmt.Println("There are no differences in the files.")
}

Copy link
Member

@at-wat at-wat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

Unfortunately, ETag on S3 is not always MD5 of the file contents.
ETag can be used to check the resource is unchanged, but can't be used to check two resources have same contents, in general.
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html

ETag
The entity tag represents a specific version of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data. Whether or not it is depends on how the object was created and how it is encrypted as described below:

  • Objects created through the AWS Management Console or by the PUT Object, POST Object, or Copy operation:
    • Objects encrypted by SSE-S3 or plaintext have ETags that are an MD5 digest of their data.
    • Objects encrypted by SSE-C or SSE-KMS have ETags that are not an MD5 digest of their object data.
  • Objects created by either the Multipart Upload or Part Copy operation have ETags that are not MD5 digests, regardless of the method of encryption.

Ref: definitions of ETag: https://datatracker.ietf.org/doc/html/rfc7232#section-2.3

@Gompei
Copy link
Author

Gompei commented Nov 27, 2021

@at-wat
Thank you for the review.
I’ll try to come up with an alternative.

If you have any good ideas, I would appreciate it if you could reply to your comments 🙏

@Gompei
Copy link
Author

Gompei commented Oct 9, 2022

Sorry for the very late commit.
Corrected to not use ETag for file diff checks. Please review again:bow:

@Gompei Gompei requested a review from at-wat October 9, 2022 10:08
Copy link
Member

@at-wat at-wat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gompei Sorry for the huge delay!

The updated algorithm seems check file size and timestamp and I think it's basically same as what the current implementation does at:

s3sync/s3sync.go

Lines 544 to 564 in baf6e27

for sourceInfo := range sourceFileChan {
destInfo, ok := destFiles[sourceInfo.name]
// source is necessary to sync if
// 1. The dest doesn't exist
// 2. The dest doesn't have the same size as the source
// 3. The dest is older than the source
if !ok || sourceInfo.size != destInfo.size || sourceInfo.lastModified.After(destInfo.lastModified) {
c <- &fileOp{fileInfo: sourceInfo}
}
if ok {
destInfo.existsInSource = true
}
}
if del {
for _, destInfo := range destFiles {
if !destInfo.existsInSource {
// The source doesn't exist
c <- &fileOp{fileInfo: destInfo, op: opDelete}
}
}
}

Is there differences from the current implementation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants