Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy browse records #13

Open
etsinko opened this issue Jan 13, 2015 · 4 comments
Open

Lazy browse records #13

etsinko opened this issue Jan 13, 2015 · 4 comments

Comments

@etsinko
Copy link
Contributor

etsinko commented Jan 13, 2015

Hello Sébastien,

I noticed that upon initialization of BrowseRecord class by calling oerp.browse('model', id) the Oerplib populates the newly created browse record with all fields from the server. This becomes slow for large models (especially when you only need to read one field) and therefore I never use them and instead only do calls to oerp.read() to save time.

What I like about browse records is that they are caching, i.e. by loading it once you can access it many times and it will not re-read data from the server.

I was wondering if there is a way to speed up them. One solution would be to implement a lazy initialization BrowseRecord. Upon creation this record will not do anything, but when a field is accessed the record should check its cache first and if the field is not there it should attempt to fetch it from the server. This should improve the speed considerably.

What do you think?

Egor

@sebalix
Copy link
Contributor

sebalix commented Jan 13, 2015

Hi Egor,

Yes, browse records can be a bit slow sometimes (some improvements could be done, see below).

It is very difficult to keep a cache of records in OERPLib, as we can't rely on any cached data if some of them has been updated in the mean time on the server, and we can't check if the cache is outdated each time (it will cost a lot of RPC queries, there is no gain to do that). So I chose to fetch the data each time we need them:

id(oerp.user.company_id) == id(oerp.user.company_id)
False

If you want to keep a record, you have to reference it:

company = oerp.user.company_id

Having a cache at the field level is difficult too, as some field values depend on other fields of the record (kind of @api.depends of the new API in Odoo), and as said, having a cache is difficult anyway.

But! All fields of a record are not fetched in fact. This is the case of all relational fields for access rights reasons.
The improvement that could be done is about binary fields (images in partners and users...), we could avoid to fetch them at it is done for relational fields.

Regards,

@etsinko
Copy link
Contributor Author

etsinko commented Jan 13, 2015

Sébastien,
I see your point, it will indeed be too dificult to implement caching here.
Maybe as an alternative there could be a light version of BrowseRecord, that only allows reading and refreshing.
Here is a short snippet that shows how slow browse records are for reading:

import time
import oerplib

oerp = oerplib.OERP(server='localhost', database='production_local')
oerp.login(user='***', passwd='***')

model = oerp.get('mrp.production')
ids = model.search([])
print 'Length = {} records'.format(len(ids))

t = time.time()
for obj in model.browse(ids):
    obj.name
print 'Browse records = {} s'.format(time.time() - t)

t = time.time()
for id in ids:
    model.read(id, ['name'])
print 'Read method (loop) = {} s'.format(time.time() - t)

t = time.time()
model.read(ids, ['name'])
print 'Read method (batch) = {} s'.format(time.time() - t)

And here is an output:

Length = 2095 records
Browse records = 66.3647060394 s
Read method (loop) = 16.3761920929 s
Read method (batch) = 0.161270141602 s

@sebalix
Copy link
Contributor

sebalix commented Jan 13, 2015

Thanks for your feedback. But there is no way to fetch only one field with browse() (except by modifying its signature to accept a list of fields). A more accurate test would be:

model = oerp.get('project.task')
ids = model.search([])
print 'Length = {} records'.format(len(ids))

t = time.time()
for task in task_model.browse(task_ids):
    task.name
print "Browse records (loop) = {} s".format(time.time() - t)
print "Browse records (batch) = TO IMPLEMENT :)"

t = time.time()
for id_ in task_ids:
    task_model.read([id_])
print "Read method (loop) = {} s".format(time.time() - t)

t = time.time()
for id_ in task_ids:
    task_model.read([id_], ['name'])
print "Read method (loop) [1 field] = {} s".format(time.time() - t)

t = time.time()
for id_ in task_ids:
    task_model.read([id_], ['id', 'name', 'project_id'])
print "Read method (loop) [3 fields] = {} s".format(time.time() - t)

t = time.time()
for id_ in task_ids:
    task_model.read([id_], ['id', 'name', 'project_id', 'message_follower_ids'])
print "Read method (loop) [3 fields + 1 comp.] = {} s".format(time.time() - t)

t = time.time()
task_model.read(task_ids)
print "Read method (batch) = {} s".format(time.time() - t)

t = time.time()
task_model.read(task_ids, ['name'])
print "Read method only (batch) [1 field] = {} s".format(time.time() - t)

t = time.time()
task_model.read(task_ids, ['id', 'name', 'project_id'])
print "Read method (batch) [3 fields] = {} s".format(time.time() - t)

t = time.time()
task_model.read(task_ids, ['id', 'name', 'project_id', 'message_follower_ids'])
print "Read method (batch) [3 fields + 1 comp.] = {} s".format(time.time() - t)

The ouput (test done on a remote server, not local):

Length = 182 records
Browse records (loop) = 55.3023481369 s
Browse records (batch) = TO IMPLEMENT :)
Read method (loop) = 57.8453090191 s
Read method (loop) [1 field]= 48.6828379631 s
Read method (loop) [3 fields] = 49.1296329498 s
Read method (loop) [3 fields + 1 comp.] = 53.3451161385 s
Read method (batch) = 1.34297895432 s
Read method (batch) [1 field] = 0.35977101326 s
Read method (batch) [3 fields] = 0.394207954407 s
Read method (batch) [3 fields + 1 comp.] = 0.496376037598 s

We can see that the performance of browse is equivalent to the read method here.
But we can see that adding just one computed field (and a simple one), we take more than 4s in loop mode.

Read or browse all fields is slower due to:

  • computed fields (also depending on @api.one / @api.multi / fields.function multi)
  • binary fields

What could be done to improve OERPLib:

  • lazy loading for binary fields,
  • lazy loading for computed fields,
  • optional batch mode for the browse() method to fetch all records at once (currently records are fetched one by one, with a Python generator),
  • change the signature of browse() to pass fields: browse(ids, fields_list, context=None).

EDIT: the same test performed locally

Length = 182 records
Browse records (loop) = 2.53341007233 s
Browse records (batch) = TO IMPLEMENT :)
Read method (loop) = 4.53692889214 s
Read method (loop) [1 field] = 0.412869930267 s
Read method (loop) [3 fields] = 0.602219820023 s
Read method (loop) [3 fields + 1 comp.] = 1.2210290432 s
Read method (batch) = 0.660733222961 s
Read method (batch) [1 field] = 0.0223989486694 s
Read method (batch) [3 fields] = 0.0404279232025 s
Read method (batch) [3 fields + 1 comp.] = 0.0892918109894 s

@etsinko
Copy link
Contributor Author

etsinko commented Jan 13, 2015

Hi Sébastien,
I totally understand that reading all fields + computed fields in a loop is going to have same performance as reading browse records in a loop. In my situation the use case is that only one or two fields need to be loaded (probably drilling down into reference fields).
I like your suggestion to change the signature of browse() to have an optional field list. I beleive this could improve the speed in cases like mine.

Here is a quick and dirty cacheable object proxy I created which reads objects and stores them in a cache. I assumed that oerp instance is hasheable and allow only one object per table:

_cache = {}
_models = {}


class ObjectProxy(object):
    """
    Cacheable read-only object proxy
    """

    def __new__(cls, oerp, model_name, obj_id, *args, **kwargs):
        # Initialize both object and model caches
        if oerp not in _models:
            _models[oerp] = {}
        if model_name not in _models[oerp]:
            _models[oerp][model_name] = oerp.get(model_name)
        if oerp not in _cache:
            _cache[oerp] = {}
        if model_name not in _cache[oerp]:
            _cache[oerp][model_name] = {}
        if obj_id not in _cache[oerp][model_name]:
            _cache[oerp][model_name][obj_id] = object.__new__(cls, oerp, model_name, obj_id, *args, **kwargs)

        return _cache[oerp][model_name][obj_id]

    def __init__(self, oerp, model_name, obj_id):
        self.oerp = oerp
        self.model = _models[oerp][model_name]
        self._cache = {}
        self.id = obj_id

    def refresh(self):
        # To prevent recursion we copy related fields first, clear cache and only then refresh related fields
        rel_fields = []
        for field in self._cache.values():
            if isinstance(field, ObjectProxy):
                rel_fields.append(field)
            elif isinstance(field, (list, tuple)):
                rel_fields += field
        self._cache.clear()
        for field in rel_fields:
            field.refresh()

    def __getattr__(self, item):
        if item not in self._cache:
            # Hacky way to get column (we don't want to re-read fields from the server, as it is redundant)
            column = self.model._browse_class.__osv__['columns'][item]
            val = self.model.read(self.id, [item])[item]
            if hasattr(column, 'relation') and val:
                if column.type in ['one2many', 'many2many']:
                    self._cache[item] = [ObjectProxy(self.oerp, column.relation, item_id) for item_id in val]
                else:
                    item_id, name = val
                    self._cache[item] = ObjectProxy(self.oerp, column.relation, item_id)
            else:
                self._cache[item] = val
        return self._cache[item]

    def __eq__(self, other):
        return isinstance(other, ObjectProxy) and other.id == self.id and other.model._name == self.model._name

    def __ne__(self, other):
        return not isinstance(other, ObjectProxy) or other.id != self.id or other.model._name != self.model._name

    def __repr__(self):
        return 'Proxy [{},{}]'.format(self.model._name, self.id)


def clear_cache():
    _cache.clear()


def create_proxy(oerp, model_name, obj_ids):
    if isinstance(obj_ids, (list, tuple)):
        return [ObjectProxy(oerp, model_name, obj_id) for obj_id in obj_ids]
    return ObjectProxy(oerp, model_name, obj_ids)

And here is a usage example:

import oerplib
oerp = oerplib.OERP(server='localhost', database='production_local')
oerp.login(user='***', passwd='***')

proxies = ObjectProxy.create_proxy(oerp, 'mrp.production', oerp.search('mrp.production', []))
print len(proxies)

proxy = proxies[-1]
proxy2 = proxies[-2]
print proxy.workcenter_lines[0].name
print proxy2.product_id.default_code
proxy.refresh()

Output:

2095
Cable Assembly - NOV VDT Dongle 287-VD1-T33 
11-250V

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants