Skip to content

Commit

Permalink
cache repo instances to reduce nb of external cmds
Browse files Browse the repository at this point in the history
With the current design of kas, the repo instances are created
on-the-fly, as the repo configuration can be changed (overridden) with
new configs from include files. As this happens during the checkout of
the repositories, all repos need to be re-created after each config
update iteration. This even affects the case when plugins access the
repos just for evaluation.

During the creation of the repo instances, shell commands are executed
that both leave traces in the kas log, as well are potentially costly on
big repositories.

As the whole repo creation logic cannot be changed without a major
refactoring, this patch introduces a cache for repo instances. There,
the whole set of input arguments is hashed and repos that did not
get updated are returned from the cache instead of being reconstructed.
This makes the output much more readable (50% less lines on some
layers).

Signed-off-by: Felix Moessbauer <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
  • Loading branch information
fmoessbauer authored and jan-kiszka committed Feb 12, 2024
1 parent 194bd0f commit becd3bf
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 5 deletions.
26 changes: 21 additions & 5 deletions kas/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
"""

import os
import json
from .repos import Repo
from .includehandler import IncludeHandler, IncludeException

Expand Down Expand Up @@ -64,6 +65,7 @@ def __init__(self, ctx, filename, target=None, task=None):
top_repo_path,
not update)
self.repo_dict = self._get_repo_dict()
self.repo_cfg_hashes = {}

def get_build_system(self):
"""
Expand Down Expand Up @@ -112,11 +114,25 @@ def get_repo(self, name):
.get('repos', {}).get(name, {})
config = self.get_repos_config()[name] or {}
top_repo_path = self.handler.get_top_repo_path()
return Repo.factory(name,
config,
repo_defaults,
top_repo_path,
overrides)

# Check if we have this repo with an identical config already.
# As this function is called across various places and with different
# configurations (e.g. due to updates from transitive includes),
# we cache the results.
args = (name, config, repo_defaults, top_repo_path, overrides)
return self._get_or_create_repo(args)

def _get_or_create_repo(self, args):
"""
Get a repo from the cache and insert it if not existing.
Creating repos is expensive due to external commands being called.
"""
encoded = json.dumps(args, sort_keys=True).encode()
if encoded in self.repo_cfg_hashes:
return self.repo_cfg_hashes[encoded]
repo = Repo.factory(*args)
self.repo_cfg_hashes[encoded] = repo
return repo

def _get_repo_dict(self):
"""
Expand Down
1 change: 1 addition & 0 deletions kas/repos.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ def factory(name, repo_config, repo_defaults, repo_fallback_path,
repo_overrides={}):
"""
Returns a Repo instance depending on params.
This factory function is referential transparent.
"""
layers_dict = repo_config.get('layers', {'': None})
layers = list(filter(lambda x, laydict=layers_dict:
Expand Down

0 comments on commit becd3bf

Please sign in to comment.