Add option to make Record accumulation of revisions (from Segment), asynchronous #510

jtnelson · 2024-04-23T01:02:48Z

Right now, reading the revisions for a record is done synchronously. This means that operations can be slow if the relevant revisions for the record are not in memory (or the Manifest for each Segment must be loaded from disk).

This really affects search performance, especially if the search cache isn't enabled.

Add an experimental preference to ENABLE_ASYNC_DATA_READS. If it is enabled, in the getCorpusRecord, getIndexRecord, getLookupRecord, getTableRecord, etc change the logic

/**
     * Return the CorpusRecord identified by {@code key}.
     * 
     * @param key
     * @param query
     * @param toks {@code query} split by whitespace
     * @return the CorpusRecord
     */
    private CorpusRecord getCorpusRecord(Text key, Text infix) {
        masterLock.readLock().lock();
        try {
            Composite composite = Composite.create(key, infix);
            Cache<Composite, CorpusRecord> cache = ENABLE_SEARCH_CACHE
                    ? corpusCaches.computeIfAbsent(key, $ -> buildCache())
                    : DISABLED_CORPUS_CACHE;
            return cache.get(composite, () -> {
                CorpusRecord $ = CorpusRecord.createPartial(key, infix);
                for (Segment segment : segments) {
                    segment.corpus().seek(composite, $);
                }
                return $;
            });
        }
        catch (ExecutionException e) {
            throw CheckedExceptions.wrapAsRuntimeException(e);
        }
        finally {
            masterLock.readLock().unlock();
        }
    }

would transform into something like

AtomicInteger count = new AtomicInteger();
List<SkinnyRecord> records // come up with a better name. This should be a special Record type that does not do any locking and overrides the append() method to just add revisions to a list. Can't use AmnesiaRecord because still need to keep track of the history
List<CompletableFuture> futures
for(Segment segment: segments) {
  int c = count.getAndIncrmenet();
  SkinnyRecord $ = 
  records.add(c, $) // ensures that the revisions get sorted in the correct order eventually
  futures.add(CompletableFuture.runAsync(() -> segment.corpus().seek(composite, $)))
}
CompletableFuture.allOf(futures).get() // gotta make it an array
return CorpusRecord.merge(records) // create new factories to merge a bunch of records in order into one record that has all the revisions. See what we can do about locking so that write locks aren't grabbed when writing

// Another options is to gather each Runnable in a list and use AwaitableExecutorService.await()

Run some tests (particular with search without cahce) to see if performance is faster...especially reading from disk

The text was updated successfully, but these errors were encountered:

jtnelson · 2024-04-23T01:16:12Z

Will need to keep in mind how the record caching works. So will probably need to create the master Record that will be added to the cache and the accumulate some smaller records and then do a

record.merge(records...) that will go through each of the records in order and append all their revisions.

jtnelson added a commit that referenced this issue Jun 30, 2024

Merge branch 'develop' into feature/GH-510-eable-async-data-reads

0f36c9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to make Record accumulation of revisions (from Segment), asynchronous #510

Add option to make Record accumulation of revisions (from Segment), asynchronous #510

jtnelson commented Apr 23, 2024

jtnelson commented Apr 23, 2024

Add option to make Record accumulation of revisions (from Segment), asynchronous #510

Add option to make Record accumulation of revisions (from Segment), asynchronous #510

Comments

jtnelson commented Apr 23, 2024

jtnelson commented Apr 23, 2024