Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Startup very slow on master (due to "lazy thread lookup" fix) #1647

Open
lfos opened this issue Feb 17, 2024 · 5 comments
Open

Startup very slow on master (due to "lazy thread lookup" fix) #1647

lfos opened this issue Feb 17, 2024 · 5 comments

Comments

@lfos
Copy link
Contributor

lfos commented Feb 17, 2024

Describe the bug
Running alot from the master branch, startup is extremely slow (almost one minute, compared to <1 second with 0.10). The culprit seems to be commit caf87d9, based on a git-bisect(1) run. Reverting that commit on master brings the startup time back to 0.10 behavior. Note that my inbox has around 500k messages.

Software Versions

  • Python version: 3.11.8
  • Alot version: c1137ea (bump gpg dependency, 2024-02-12)
@lfos lfos changed the title alot startup very slow (due to "lazy thread lookup" fix) Startup very slow on master (due to "lazy thread lookup" fix) Feb 17, 2024
@lfos
Copy link
Contributor Author

lfos commented Sep 1, 2024

Since 0.11 has recently been released, I went back to mainline and noticed this is still a major issue. Without locally reverting caf87d9, alot is too slow for everyday use for me, despite using a faily decent system with SSD.

@pazz Given the significant performance impact, should we revert the commit? Any alternatives we can explore?

@pazz
Copy link
Owner

pazz commented Sep 3, 2024

@lucc do you have an opinion?
I don't mind reverting but cannot recall the history or having done any deep analysis on this.
that hotfix I merged does look quite intrusive

@lucc
Copy link
Collaborator

lucc commented Sep 3, 2024

I do not remember much from when the original fix was merged. Looking at the code it seems that the generator that produces the thread ids might produce an invalid thread id if the underlying message gets deleted (from the buffer or the db by the way?).

Right about now I would be happy to have a test case for the original issue so that we could test alternative solutions. Could we somehow validate the thread/id before yielding it from the generator? That way we could benefit from the lazyness but not be bitten by invalid pointer.

@lfos
Copy link
Contributor Author

lfos commented Sep 6, 2024

Some background and explanations on how to reproduce can be found in issue #1460. Based on others' comments there, commit caf87d9 seems to have been introduced as a workaround for a notmuch bug, and it is unclear whether it is still needed. FWIW, in order to be able to use alot, I have reverted caf87d9 locally, and I can't reproduce the crash when following the steps outlined in the original bug report #1460.

@lfos
Copy link
Contributor Author

lfos commented Jan 31, 2025

FWIW, I've been using this patch for a while -- essentially a mix between old and new behavior:

diff --git a/alot/db/manager.py b/alot/db/manager.py
index 39c34e2e..09c537ab 100644
--- a/alot/db/manager.py
+++ b/alot/db/manager.py
@@ -4,6 +4,7 @@
 # For further details see the COPYING file
 from collections import deque
 import contextlib
+import itertools
 import logging
 
 from notmuch2 import Database, NotmuchError, XapianError
@@ -344,11 +345,18 @@ class DBManager:
         assert sort in self._sort_orders
         db = Database(path=self.path, mode=Database.MODE.READ_ONLY,
                       config=self.config)
-        thread_ids = [t.threadid for t in db.threads(querystring,
-                            sort=self._sort_orders[sort],
-                            exclude_tags=self.exclude_tags)]
-        for t in thread_ids:
-            yield t
+
+        # retrieve and store the first 1000 thread IDs
+        thread_iterator = db.threads(querystring,
+                                     sort=self._sort_orders[sort],
+                                     exclude_tags=self.exclude_tags)
+        thread_ids = [t.threadid for t in
+                      itertools.islice(thread_iterator, 1000)]
+
+        # yield the stored thread IDs first, then proceed with iterator
+        yield from thread_ids
+        for t in thread_iterator:
+            yield t.threadid
 
     def add_message(self, path, tags=None, afterwards=None):
         """

While definitely still a workaround, it at least makes alot usable again for me (load times are acceptable again and no crashes so far).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants