Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-127936: convert marshal module to use import/export API for ints (PEP 757) #128530

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

skirpichev
Copy link
Member

@skirpichev skirpichev commented Jan 6, 2025

Benchmark ref-dumps patch-dumps
dumps 1<<38 628 ns 640 ns: 1.02x slower
dumps 1<<300 635 ns 696 ns: 1.10x slower
dumps 1<<3000 2.02 us 2.28 us: 1.13x slower
Geometric mean (ref) 1.06x slower

Benchmark hidden because not significant (1): dumps 1<<7

Benchmark ref-loads patch-loads
loads 1<<7 303 ns 311 ns: 1.03x slower
loads 1<<38 334 ns 388 ns: 1.16x slower
loads 1<<300 516 ns 605 ns: 1.17x slower
loads 1<<3000 2.08 us 2.56 us: 1.23x slower
Geometric mean (ref) 1.15x slower
scripts
# bench-dumps.py

from marshal import dumps
import pyperf

values = ['1<<7', '1<<38', '1<<300', '1<<3000']
runner = pyperf.Runner()
for v in values:
    i = eval(v)
    bn = 'dumps '+v
    runner.bench_func(bn, dumps, i)
# bench-loads.py

from marshal import loads, dumps
import pyperf

values = ['1<<7', '1<<38', '1<<300', '1<<3000']
runner = pyperf.Runner()
for v in values:
    d = dumps(eval(v))
    bn = 'loads '+v
    runner.bench_func(bn, loads, d)

@skirpichev skirpichev force-pushed the port-marshal-to-pep757/127936 branch from 1ab0c30 to 4bd6b0c Compare January 7, 2025 05:06
@skirpichev skirpichev marked this pull request as ready for review January 7, 2025 05:34
@skirpichev
Copy link
Member Author

CC @vstinner

I suspect that major slowdown for marshal.loads() benchmarks is due to normalization&singletonization in the PyLongWriter_Finish(). In case of marshal.dumps() I suspect things were better without the value field.

Python/marshal.c Show resolved Hide resolved
Python/marshal.c Outdated Show resolved Hide resolved
Python/marshal.c Outdated Show resolved Hide resolved
Python/marshal.c Outdated Show resolved Hide resolved
Python/marshal.c Outdated Show resolved Hide resolved
assert(layout->digit_endianness == (PY_LITTLE_ENDIAN ? -1 : 1));
assert(layout->digit_size == 2 || layout->digit_size == 4);

Py_ssize_t size = 1 + (Py_ABS(n) - 1) / marshal_ratio;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a special case for n==0, like the code that you removed, no?

Copy link
Member Author

@skirpichev skirpichev Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no, it's just size=0size=1.

Edit: But 0 has one digit allocated anyway. So, all works correctly:

>>> import marshal
>>> marshal.dumps(0)
b'\xe9\x00\x00\x00\x00'
>>> marshal.loads(marshal.dumps(0))
0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If n==0, 1 + (Py_ABS(n) - 1) / marshal_ratio gives size=0, no?

size=0 is annoying. _w_digitsXX() functions use size - 1 for example.

Please keep if (n == 0) return (PyObject *)_PyLong_New(0); to avoid this annoying case.

But 0 has one digit allocated anyway. So, all works correctly:

Ok, but I'm thinking about invalid/special marshal data. I would prefer to avoid a crash if possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if n==0, 1 + (Py_ABS(n) - 1) / marshal_ratio gives size=0, no?

No. (-1)/positive is 0 in C.

Please keep if (n == 0) return (PyObject *)_PyLong_New(0)

Then we will depend on this import.

I'm thinking about invalid/special marshal data.

An example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. (-1)/positive is 0 in C.

Oh, I forgot that. It might help future readers to add a comment explaining that size=1 if n=0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add also an assertion: assert(size >= 1).

@skirpichev skirpichev requested a review from vstinner January 8, 2025 07:53
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm (just) a bit unfortunate that the using PEP 757 makes the code slower.

@serhiy-storchaka @picnixz: Would you mind to review this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants