Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-126703: Add PyCFunction freelist #128692

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Jan 9, 2025

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one comment.

@@ -22,6 +22,8 @@ extern "C" {
# define Py_futureiters_MAXFREELIST 255
# define Py_object_stack_chunks_MAXFREELIST 4
# define Py_unicode_writers_MAXFREELIST 1
# define Py_pycfunctionobject_MAXFREELIST 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these lengths being chosen? 16 seems fine for PyCFunction, but I would think that methods are created much more frequently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choosing the number is a bit of an art (and a good number will depend what kind of code is executed).

How well the freelist works depends on the dynamics of the allocation and deallocation. For example the size of Py_unicode_writers_MAXFREELIST is only 1. I suspect this is because the typical use case is: create a writer, write some data, release it again. And during the writing of the data no new unicode writers are constructed.

With the diff below you can see whether a PyCMethodObject is obtained from the freelist (if the size > 0), or whether a new one is allocated (when size = 0).

diff --git a/Include/internal/pycore_freelist.h b/Include/internal/pycore_freelist.h
index 84a5ab30f3e..90d6616cbfd 100644
--- a/Include/internal/pycore_freelist.h
+++ b/Include/internal/pycore_freelist.h
@@ -28,6 +28,10 @@ _Py_freelists_GET(void)
 #endif
 }

+#define _Py_FREELIST_FREE_PRINT(NAME, op, freefunc) \
+    _PyFreeList_Free_Print(&_Py_freelists_GET()->NAME, _PyObject_CAST(op), \
+                     Py_ ## NAME ## _MAXFREELIST, freefunc, #NAME)
+
 // Pushes `op` to the freelist, calls `freefunc` if the freelist is full
 #define _Py_FREELIST_FREE(NAME, op, freefunc) \
     _PyFreeList_Free(&_Py_freelists_GET()->NAME, _PyObject_CAST(op), \
@@ -69,6 +73,16 @@ _PyFreeList_Free(struct _Py_freelist *fl, void *obj, Py_ssize_t maxsize,
     }
 }

+static inline void
+_PyFreeList_Free_Print(struct _Py_freelist *fl, void *obj, Py_ssize_t maxsize,
+                 freefunc dofree, const char *name)
+{
+    if (!_PyFreeList_Push(fl, obj, maxsize)) {
+        printf("    %s: no space to store object\n", name);
+        dofree(obj);
+    }
+}
+
 static inline void *
 _PyFreeList_PopNoStats(struct _Py_freelist *fl)
 {
diff --git a/Objects/methodobject.c b/Objects/methodobject.c
index fc77055b0a2..58d928d78bf 100644
--- a/Objects/methodobject.c
+++ b/Objects/methodobject.c
@@ -86,6 +86,7 @@ PyCMethod_New(PyMethodDef *ml, PyObject *self, PyObject *module, PyTypeObject *c
                             "flag but no class");
             return NULL;
         }
+        printf("PyCMethodObject: try allocation (freelist size %d)\n", _Py_FREELIST_SIZE(pycmethodobject));
         PyCMethodObject *om = _Py_FREELIST_POP(PyCMethodObject, pycmethodobject);
         if (om == NULL) {
             om = PyObject_GC_New(PyCMethodObject, &PyCMethod_Type);
@@ -102,6 +103,7 @@ PyCMethod_New(PyMethodDef *ml, PyObject *self, PyObject *module, PyTypeObject *c
                             "but no METH_METHOD flag");
             return NULL;
         }
+        printf("PyCFunctionObject: try allocation (freelist size %d)\n", _Py_FREELIST_SIZE(pycfunctionobject));
         op = _Py_FREELIST_POP(PyCFunctionObject, pycfunctionobject);
         if (op == NULL) {
             op = PyObject_GC_New(PyCFunctionObject, &PyCFunction_Type);
@@ -180,11 +182,11 @@ meth_dealloc(PyObject *self)
     Py_XDECREF(m->m_module);
     if (m->m_ml->ml_flags & METH_METHOD) {
         assert(Py_IS_TYPE(self, &PyCMethod_Type));
-        _Py_FREELIST_FREE(pycmethodobject, m, PyObject_GC_Del);
+        _Py_FREELIST_FREE_PRINT(pycmethodobject, m, PyObject_GC_Del);
     }
     else {
         assert(Py_IS_TYPE(self, &PyCFunction_Type));
-        _Py_FREELIST_FREE(pycfunctionobject, m, PyObject_GC_Del);
+        _Py_FREELIST_FREE_PRINT(pycfunctionobject, m, PyObject_GC_Del);
     }
     Py_TRASHCAN_END;
 }

When I use this on python -m test test_operator:

  • there are many allocations at the start of the program (because nothing has been deallocated, so there is nothing on the freelist)
  • there is a dynamic section where often objects are allocated and deallocated, the freelist size is changing a lot, but it is not reaching the maximum size of 16
  • at the end the python interpreter is closing down, so many objects are deallocated

Based on this there would be no need to increase the freelist size. On the other hand, it is almost free to increase the size (the only memory is the objects on the freelist).

I am fine with changing the size (anything between 4 and 400 would be fine with me), but I am not sure it matters or how to make a more informed decision.

Full output is at:

https://gist.github.com/eendebakpt/7a867587450dca4689bb46271fb01ec2

@vstinner
Copy link
Member

Would you mind to rerun the benchmark on this PR?

@eendebakpt
Copy link
Contributor Author

Here are benchmarks against current main (Linux, PGO). The first benchmark tests the freelist from this PR, the other two are control benchmarks (they should not be affected):

bench_builtin_or_method: Mean +- std dev: [main0] 5.51 us +- 0.11 us -> [pr] 4.03 us +- 0.07 us: 1.37x faster
bench_property: Mean +- std dev: [main0] 1.68 us +- 0.02 us -> [pr] 1.70 us +- 0.03 us: 1.01x slower
bench_class_method: Mean +- std dev: [main0] 1.90 us +- 0.05 us -> [pr] 1.88 us +- 0.04 us: 1.01x faster

Geometric mean: 1.11x faster
Script
# Quick benchmark for cpython freelists

import pyperf

def bench_builtin_or_method(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(50))

    lst = []
    it = iter(set([2, 3, 4]))
    t0 = pyperf.perf_counter()
    for ii in range_it:
        for ii in tpl:
            lst.append
            it.__length_hint__
    return pyperf.perf_counter() - t0


class A:
    def __init__(self, value):
        self.value = value

    def x(self):
        return self.value

    @property
    def v(self):
        return self.value


def bench_property(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(50))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        a = A(ii)
        for ii in tpl:
            _ = a.v
    return pyperf.perf_counter() - t0


def bench_class_method(loops):
    range_it = iter(range(loops))
    tpl = tuple(range(50))

    t0 = pyperf.perf_counter()
    for ii in range_it:
        a = A(ii)
        for ii in tpl:
            _ = a.x()
    return pyperf.perf_counter() - t0


if __name__ == "__main__":
    runner = pyperf.Runner()
    runner.bench_time_func("bench_builtin_or_method", bench_builtin_or_method)
    runner.bench_time_func("bench_property", bench_property)
    runner.bench_time_func("bench_class_method", bench_class_method)

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants