-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
making the option --limit in pull usable. #165
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only work once. Next pulls you have to skip the limit option.
Yes, it is unnecessary after the first pull. Don't you think so? After the first pull, limit option does not make sense as only the new mail are pulled over. It also removes the need to associate it with any other option. |
Lattitude75 writes on July 15, 2020 9:37:
> This will only work once. Next pulls you have to skip the limit option.
Yes, it is unnecessary after the first pull. Don't you think so? After the first pull, limit option does not make sense as only the new mail are pulled over. It also removes the need to associate it with any other option.
No, sometimes all email is pulled. E.g. if the historyId is too old. Or if
someone does `gmi pull -f`. The logic in `partial_pull` expects a successful
full pull to have been done, there might be some implications there. I think
a better approach is the PR that also uses a query, so that you have a remote
(sub-query) that matches the local state in a consistent way. It is better to be
a bit slower and inflexible than risk ending up in inconsistent state. The
reason for rejecting the original PR was that it introduced significantly more
complexity, which I was not certain did not break the state. The `--limit`
option is worse than that PR, and thus should probably be removed.
|
No, it is the full_pull function where the --limit is implemented as shown in the excerpt from it below.
It is the mset value which is limited in the full_pull. And since this is the very function which is called in case of a forced pull or if the historyId is too old, it should not cause any inconsistencies. Further, a full_pull with the --limit option is considered as a successful full pull by the partial_pull function and I have been able to use it for the past few days now without any inconsistency or a problem. Thus, the --limit option works, if the --limit variable can be made to be a global variable in the .gmailieer.json file (as I have added to this PR). Do let me know if what I said seems right. |
I do not think that any full pull with a limit variable defined is going to pull all the email. Looking at the full_pull function:
The variable mset is limited using the limit variable. So, if we make sure that limit is added to the set variables in .gmailieer.json, the full pull should never cause a problem. Even the scenarios that you listed, both forced and a very old historyId, call the very full_pull function. I have added the commit to add limit to the full_pull variables in this PR. Thus, the end user can just use the |
Sorry, closed it by mistake! |
I meant if you do another sync later and do not include the `--limit` flag you
will get a lot of emails if a full sync is needed. In that sense it makes sense
to have --limit only as a set option. Maybe that is feasible to do in a good
way.
Partial pulls will not do any limiting, and will not know which emails to remove
locally. That means that you will get a different result from doing many small
partial pulls and a full pull. That is not so good and something that I strive
to avoid. Is this possible to achieve without fetching up to limit number of
emails when doing partial pull?
|
The last commit I added should fix the problem with the partial pull. It makes sure that the number of messages in the database is always below the limit set using |
lieer/gmailieer.py
Outdated
@@ -549,6 +539,17 @@ def remove_from_list (lst, m): | |||
|
|||
changed = True | |||
|
|||
#limiting the number of messages in the database to local.config.limit parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not have the same order as the list from gmail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, that is the least we can do to make sure that the oldest messages are discarded to maintain the limit. One way might be to chose a different way to sort other than NEWEST_FIRST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, now I see the problem. Maybe, we can go for an oldest thread. That way we only delete a set of messages which did not show up in the front in quite sometime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if there is a reliable way to make notmuch sort messages the same way as gmail does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it does come pretty close. The only warning the users should face is that the threads at the very end might be incomplete because of the message limit. But, what we can do is to make sure that threads remain complete and thus remove the old threads as a whole until the sum of the messages is below the limit. What do you prefer for this limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the recent commit, I decided to keep the threads intact and thus deciding on the number of messages to be deleted based on threads. And, I also added a progress bar to indicate that older messages are being removed.
lieer/local.py
Outdated
@@ -413,6 +419,24 @@ def messages_to_gids (self, msgs): | |||
|
|||
return (messages, gids) | |||
|
|||
def nm_messages_to_gids (self, msgs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the existing function, or share code in some way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, using the existing function might affect the performance if the number of new messages is large. Then, making a new array of messages is completely unnecessary and can add to the processing load. Should I still resort to code directly than to make a new function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have still used the function that I have added, let me know if you want me to include the code directly in gmailieer.py instead.
@@ -344,8 +333,8 @@ def _got_msgs (ms): | |||
actions = [ a for a in actions if a ] | |||
|
|||
# limit | |||
if self.limit is not None and len(actions) >= self.limit: | |||
actions = actions[:self.limit] | |||
if self.local.config.limit is not None and len(actions) >= self.local.config.limit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>= ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not checked this part of the code actually. This does not seem necessary now anyways, if the number of messages itself is limited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actions is not the same as number of messages. i think if this PR is going to be considered for merging it needs to be thoroughly tested, over a longer time, and shown to work well. many users rely on lieer to work correctly, and we have to make sure to not break setups or cause data loss. it also needs to be tested with different limits and also with no limit. I do not guarantee that it will be merged in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the >=
comment refers to mismatching use of sometimes >
and sometimes >=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I did go through the mismatching operations. I did not change any of them and it is your call as they remain the same as in the original repository. On that note in the push function, I do not think the if loops to limit the messages and actions are necessary now. They can be removed. Should I do that in the next commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actions is not the same as number of messages. i think if this PR is going to be considered for merging it needs to be thoroughly tested, over a longer time, and shown to work well. many users rely on lieer to work correctly, and we have to make sure to not break setups or cause data loss. it also needs to be tested with different limits and also with no limit. I do not guarantee that it will be merged in the end.
And, yes I will test it for the said scenarios to see if it works well.
In the final commit, I made the necessary changes like removing the extra function I defined and adding indicators where the limiting action was being taken. I also tested it for the following scenarios:
And it worked as expected without any issues. The users should only be informed of one particular use case. When a user has limited the number of messages to, for example, 400 messages and after a while decided to increase the limit to like 1000, the user needs to make a |
Lattitude75 writes on July 17, 2020 10:12:
In the final commit, I made the necessary changes like removing the extra function I defined and adding indicators where the limiting action was being taken. I also tested it for the following scenarios:
- Without setting any limit
- Setting the limit and then performing multiple pulls
- Start a full pull without setting the limit, then set the limit to some number below the total messages in my email
And it worked as expected without any issues.
The users should only be informed of one particular use case. When a user has limited the number of messages to, for example, 400 messages and after a while decided to increase the limit to like 1000, the user needs to make a `gmi pull -f`. Thus, after any increase in the limit parameter set would require a forced pull. In any other case, the partial_pull function would take care to enforce the limit.
Good. It should probably link to this section in a similar way as other options
do: https://github.com/gauteh/lieer#changing-ignored-tags-and-translation-after-initial-sync
also, other options use the convention --unset-xxxx, we should stick
to the same
mode for the limit.
We need to test this for quite a while, I probably won't have time to do
detailed review in a couple of weeks. But one thought I had was that we should
probably limit the number of messages requested from gmail. This would also fix
the progressbar. Please check if such an option is possible in the gmail API.
This is closely related to #129, which was rejected. It might be that #129 is a
better approach. @coiby might still be interested in the discussion.
|
Though the limit in self.remote.all_messages(limit=None) can be used for this purpose and it does work, I do not think it would be a good idea to set the total messages for the progress bar for a use case where the number of messages is less than the limit set. The progress bar then remains incomplete, which is not very good. I think it is better to leave it here as it serves as a counter. In the latest commit, the output looks like this:
This way, the counter fetched messages serves as a final count to the number of messages to be fetched irrespective of whether it is less or more than the limit set. Edit: By the way, I will be using lieer with this commit for three of my emails with Astroid on a regular basis. This should be enough for testing in my opinion. I'll put in a comment if I notice something fishy. |
Added |
Please could some kind soul explain what this PR actually does? It sounds like something which could help solve the problems I'm experiencing with syncing a massive account. |
This, basically, makes the --limit option functional.