Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to make Record accumulation of revisions (from Segment), asynchronous #510

Open
jtnelson opened this issue Apr 23, 2024 · 1 comment

Comments

@jtnelson
Copy link
Member

Right now, reading the revisions for a record is done synchronously. This means that operations can be slow if the relevant revisions for the record are not in memory (or the Manifest for each Segment must be loaded from disk).

This really affects search performance, especially if the search cache isn't enabled.

Add an experimental preference to ENABLE_ASYNC_DATA_READS. If it is enabled, in the getCorpusRecord, getIndexRecord, getLookupRecord, getTableRecord, etc change the logic

/**
     * Return the CorpusRecord identified by {@code key}.
     * 
     * @param key
     * @param query
     * @param toks {@code query} split by whitespace
     * @return the CorpusRecord
     */
    private CorpusRecord getCorpusRecord(Text key, Text infix) {
        masterLock.readLock().lock();
        try {
            Composite composite = Composite.create(key, infix);
            Cache<Composite, CorpusRecord> cache = ENABLE_SEARCH_CACHE
                    ? corpusCaches.computeIfAbsent(key, $ -> buildCache())
                    : DISABLED_CORPUS_CACHE;
            return cache.get(composite, () -> {
                CorpusRecord $ = CorpusRecord.createPartial(key, infix);
                for (Segment segment : segments) {
                    segment.corpus().seek(composite, $);
                }
                return $;
            });
        }
        catch (ExecutionException e) {
            throw CheckedExceptions.wrapAsRuntimeException(e);
        }
        finally {
            masterLock.readLock().unlock();
        }
    }

would transform into something like

AtomicInteger count = new AtomicInteger();
List<SkinnyRecord> records // come up with a better name. This should be a special Record type that does not do any locking and overrides the append() method to just add revisions to a list. Can't use AmnesiaRecord because still need to keep track of the history
List<CompletableFuture> futures
for(Segment segment: segments) {
  int c = count.getAndIncrmenet();
  SkinnyRecord $ = 
  records.add(c, $) // ensures that the revisions get sorted in the correct order eventually
  futures.add(CompletableFuture.runAsync(() -> segment.corpus().seek(composite, $)))
}
CompletableFuture.allOf(futures).get() // gotta make it an array
return CorpusRecord.merge(records) // create new factories to merge a bunch of records in order into one record that has all the revisions. See what we can do about locking so that write locks aren't grabbed when writing

// Another options is to gather each Runnable in a list and use AwaitableExecutorService.await()

Run some tests (particular with search without cahce) to see if performance is faster...especially reading from disk

@jtnelson
Copy link
Member Author

Will need to keep in mind how the record caching works. So will probably need to create the master Record that will be added to the cache and the accumulate some smaller records and then do a

record.merge(records...) that will go through each of the records in order and append all their revisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant