Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves the deserialization speed by reducing object initialization overhead #284

Merged
merged 6 commits into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions amazon/ion/equivalence.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
from amazon.ion.symbols import SymbolToken


def isinstance_of_ion_nature(obj):
def isinstance_of_ion_nature_or_new_ion_py_objects(obj):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method will be deprecated with other old methods later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right... But what will replace it? I ask because I think that would be either just trying to get the iontype and annotations or testing if the two objects hasattr them. So why not do that now and avoid the series of ors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, refactored it to check if both ion_type and ion_annotations attributes exist.

return isinstance(obj, _IonNature) or isinstance(obj, IonPyNull_new) \
or isinstance(obj, IonPyDecimal_new) or isinstance(obj, IonPyInt_new) \
or isinstance(obj, IonPyFloat_new) or isinstance(obj, IonPyText_new) \
Expand Down Expand Up @@ -70,8 +70,8 @@ def _ion_equals_timestamps_data_model(a, b):
def _ion_equals(a, b, timestamp_comparison_func, recursive_comparison_func):
"""Compares a and b according to the description of the ion_equals method."""
for a, b in ((a, b), (b, a)): # Ensures that operand order does not matter.
if isinstance_of_ion_nature(a):
if isinstance_of_ion_nature(b):
if isinstance_of_ion_nature_or_new_ion_py_objects(a):
if isinstance_of_ion_nature_or_new_ion_py_objects(b):
# Both operands have _IonNature. Their IonTypes and annotations must be equivalent.
eq = a.ion_type is b.ion_type and _annotations_eq(a, b)
else:
Expand Down
87 changes: 34 additions & 53 deletions amazon/ion/ion_py_objects.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,18 @@


class IonPyNull_new(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may discover a reasonable "why" later, but barring that, ditch the _new suffix please.

Copy link
Contributor Author

@cheqianh cheqianh Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will keep this comment unsolved until renaming them.

I named them object, object_new and object_original so that we can separate them in the existing tests for easier debug and performance comparisons.

EDIT: Once we approve other comments within this PR, I'll rename these classes to their appropriate names

Copy link
Contributor

@rmarrowstone rmarrowstone Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm this is still the plan. [refers to resolved conversation about removing _new classes before actually merging]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll rename it once we address other comments. I'm using these class names for easier before/after performance comparison...

Copy link
Contributor Author

@cheqianh cheqianh Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed the updated question. Would you also refer to removing the old IonPyObject? yeah I think we can completely switch to the _new IonPyObject and rename them to the official IonPyObject. This won't cause breaking changes. (I'm not sure if the pure python implementation still need old IonPyObjects. If they are, I'll leave a comment for further discussion otherwise I'll safely deprecate these old IonPyObjects, maybe a separate sub-PR targeting this one for all the clean up work)

I left draft comments and tried to send all of them at once but seems the comment doesn't cache draft once I click resolve conversation.

Copy link
Contributor Author

@cheqianh cheqianh Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit b6fe2df renames IonPyObject_new to IonPyObject, we can safely remove IonPyObject_original now.

ion_annotations = ()
ion_type = IonType.NULL # TODO initialized to NULL type first, what's the real type?

__name__ = 'IonPyNull_new'
__qualname__ = 'IonPyNull_new'
ion_type = IonType.NULL

def __init__(self, ion_type=None, value=None, annotations=()):
self.ion_type = ion_type
def __init__(self, ion_type=IonType.NULL, value=None, annotations=()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop the value parameter? Or is it needed for consistency with the other __init__ implementations?

Copy link
Contributor Author

@cheqianh cheqianh Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to maintain the constructors' signatures identical to from_value's, so that we can perform an in-place replacement. This approach can also help us quantify the overhead caused by from_value. For example, switching from from_value to their constructors introduces an 8-12% improvement in performance.

value is not necessarily required after the PR is merged. I added a clean-up work 3 above to drop this parameter.

self.ion_type = ion_type # TODO initialized to NULL type first, what's the real type?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the action here? I would say it's OK as it is... or suggest that for IonPyNull we could avoid the class attribute. But I don't understand if this points to a deeper issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

self.ion_annotations = annotations

def __bool__(self):
return False

def _copy(self):
def __copy__(self):
args, kwargs = self._to_constructor_args(self)
value = self.__class__(*args, **kwargs)
value.ion_type = self.ion_type
Expand Down Expand Up @@ -68,18 +66,16 @@ def to_event(self, event_type, field_name=None, in_struct=False, depth=None):


class IonPyDecimal_new(Decimal):
ion_annotations = ()
ion_type = IonType.DECIMAL

__name__ = 'IonPyDecimal_new'
__qualname__ = 'IonPyDecimal_new'
ion_type = IonType.DECIMAL

def __new__(cls, ion_type, value, annotations=()):
def __new__(cls, ion_type=IonType.DECIMAL, value=None, annotations=()):
v = super().__new__(cls, value)
v.ion_annotations = annotations
return v

def _copy(self):
def __copy__(self):
args, kwargs = self._to_constructor_args(self)
value = self.__class__(*args, **kwargs)
value.ion_type = self.ion_type
Expand Down Expand Up @@ -128,18 +124,16 @@ def to_event(self, event_type, field_name=None, in_struct=False, depth=None):


class IonPyInt_new(int):
ion_annotations = ()
ion_type = IonType.INT

__name__ = 'IonPyInt_new'
__qualname__ = 'IonPyInt_new'
ion_type = IonType.INT

def __new__(cls, ion_type, value, annotations=()):
def __new__(cls, ion_type=IonType.INT, value=None, annotations=()):
v = super().__new__(cls, value)
v.ion_annotations = annotations
return v

def _copy(self):
def __copy__(self):
args, kwargs = self._to_constructor_args(self)
value = self.__class__(*args, **kwargs)
value.ion_type = self.ion_type
Expand Down Expand Up @@ -188,21 +182,19 @@ def to_event(self, event_type, field_name=None, in_struct=False, depth=None):


class IonPyBool_new(int):
ion_annotations = ()
ion_type = IonType.BOOL

__name__ = 'IonPyBool_new'
__qualname__ = 'IonPyBool_new'
ion_type = IonType.BOOL

def __repr__(self):
return str(bool(self))

def __new__(cls, ion_type, value, annotations=()):
def __new__(cls, ion_type=IonType.BOOL, value=None, annotations=()):
v = super().__new__(cls, value)
v.ion_annotations = annotations
return v

def _copy(self):
def __copy__(self):
args, kwargs = self._to_constructor_args(self)
value = self.__class__(*args, **kwargs)
value.ion_type = self.ion_type
Expand Down Expand Up @@ -251,18 +243,16 @@ def to_event(self, event_type, field_name=None, in_struct=False, depth=None):


class IonPyFloat_new(float):
ion_annotations = ()
ion_type = IonType.FLOAT

__name__ = 'IonPyFloat_new'
__qualname__ = 'IonPyFloat_new'
ion_type = IonType.FLOAT

def __new__(cls, ion_type, value, annotations=()):
def __new__(cls, ion_type=IonType.FLOAT, value=None, annotations=()):
v = super().__new__(cls, value)
v.ion_annotations = annotations
return v

def _copy(self):
def __copy__(self):
args, kwargs = self._to_constructor_args(self)
value = self.__class__(*args, **kwargs)
value.ion_type = self.ion_type
Expand Down Expand Up @@ -311,18 +301,16 @@ def to_event(self, event_type, field_name=None, in_struct=False, depth=None):


class IonPyText_new(str):
ion_annotations = ()
ion_type = IonType.STRING

__name__ = 'IonPyText_new'
__qualname__ = 'IonPyText_new'
ion_type = IonType.STRING

def __new__(cls, ion_type=None, value=None, annotations=()):
def __new__(cls, ion_type=IonType.STRING, value=None, annotations=()):
v = super().__new__(cls, value)
v.ion_annotations = annotations
return v

def _copy(self):
def __copy__(self):
args, kwargs = self._to_constructor_args(self)
value = self.__class__(*args, **kwargs)
value.ion_type = self.ion_type
Expand Down Expand Up @@ -371,19 +359,17 @@ def to_event(self, event_type, field_name=None, in_struct=False, depth=None):


class IonPySymbol_new(SymbolToken):
ion_annotations = ()
ion_type = IonType.SYMBOL

__name__ = 'IonPySymbol_new'
__qualname__ = 'IonPySymbol_new'
ion_type = IonType.SYMBOL

# a good signature: IonPySymbol(ion_type, symbol_token, annotation)
def __new__(cls, ion_type, value, annotations=()):
def __new__(cls, ion_type=IonType.SYMBOL, value=None, annotations=()):
v = super().__new__(cls, *value)
v.ion_annotations = annotations
return v

def _copy(self):
def __copy__(self):
args, kwargs = self._to_constructor_args(self)
value = self.__class__(*args, **kwargs)
value.ion_type = self.ion_type
Expand Down Expand Up @@ -437,19 +423,19 @@ def to_event(self, event_type, field_name=None, in_struct=False, depth=None):


class IonPyList_new(list):
ion_annotations = ()
ion_type = IonType.LIST

__name__ = 'IonPyList_new'
__qualname__ = 'IonPyList_new'
ion_type = IonType.LIST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like with null I would not have the class attribute for List.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


def __init__(self, ion_type, value, annotations=()):
def __init__(self, ion_type=IonType.LIST, value=None, annotations=()):
if value is None:
value = []
super().__init__(value)
self.ion_annotations = annotations
# it's possible to be Sexp
self.ion_type = ion_type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the constructor still can control ion_type as it's possible to be Sexp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this IonPySequence or something, then? Or does this just need to conform to the existing naming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, It won't break any current use case since we can call it any name we want (E.g., we are calling it an unofficial name with the _new suffix). Since the original list and Sexp are called IonPyList, this might affect some write use case (Not sure, maybe for the read use case, people just need to use IonPyList as a native list so they don't need to worry about its class name?).

Or maybe we can redeclare it to ionPyList in simple_types.py first, then fully deprecate it when we make breaking changes in the future.

ionPyList = ionPySequence


def _copy(self):
def __copy__(self):
args, kwargs = self._to_constructor_args(self)
value = self.__class__(*args, **kwargs)
value.ion_type = self.ion_type
Expand Down Expand Up @@ -498,22 +484,17 @@ def to_event(self, event_type, field_name=None, in_struct=False, depth=None):


class IonPyDict_new(MutableMapping):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is existing pydoc about the contract for setting, deleting and accessing fields with multiple mappings please ensure it is moved. If not, now is a great time to add it! :)

Copy link
Contributor Author

@cheqianh cheqianh Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'm working on adding a method documents according to the current behavior.

ion_annotations = ()
ion_type = IonType.STRUCT

__name__ = 'IonPyDict_new'
__qualname__ = 'IonPyDict_new'
ion_type = IonType.STRUCT

# one good initialization example: my_dict = IonPyDict(ion_type, value, annotations)
def __init__(self, *args, **kwargs):
def __init__(self, ion_type=IonType.STRUCT, value=None, annotations=()):
super().__init__()
self.ion_annotations = annotations
self.__store = {}
cheqianh marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the resolved comment thread this was to change to OrderedDict.

Copy link
Contributor Author

@cheqianh cheqianh Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Reverted by accident when I benchmark before/after commits.

if args is not None:
if len(args) > 1 and args[1] is not None:
for key, value in iter(args[1].items()):
self.__store[key] = [value]
if len(args) > 2 and args[2] is not None:
self.ion_annotations = args[2]
if value is not None:
for key, value in iter(value.items()):
self.__store[key] = [value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this overwrite when value.items() has multiple mappings for a field?

Copy link
Contributor Author

@cheqianh cheqianh Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter value of the original multimap constructor is limited to Python's native dictionary. I added logic to support multi-value Map.


def __getitem__(self, key):
return self.__store[key][len(self.__store[key]) - 1] # Return only one in order not to break clients
Expand Down Expand Up @@ -557,7 +538,7 @@ def items(self):
output.append((k, v))
return output

def _copy(self):
def __copy__(self):
args, kwargs = self._to_constructor_args(self)
value = self.__class__(*args, **kwargs)
value.ion_type = self.ion_type
Expand Down
5 changes: 2 additions & 3 deletions tests/test_core_multimap.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
from typing import Sequence, NamedTuple
cheqianh marked this conversation as resolved.
Show resolved Hide resolved

# from amazon.ion.core import Multimap
from amazon.ion.ion_py_objects import IonPyDict_new as Multimap # replace the original multiMap with the new IonPyDict
from amazon.ion.core import Multimap

from tests import parametrize

Expand Down Expand Up @@ -64,7 +63,7 @@ def test_delete_item(item):
{"a": 1, "b": 2, "c": [1, 2, {3: 4}]}
)
def test_constructor(d):
m = Multimap(None, d)
m = Multimap(d)
for k, v in iter(d.items()):
assert m[k] == v
assert len(m) == len(d)
69 changes: 69 additions & 0 deletions tests/test_multimap_IonPyDict.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
from typing import Sequence, NamedTuple

from amazon.ion.ion_py_objects import IonPyDict_new as Multimap

from tests import parametrize


class _P(NamedTuple):
pairs: Sequence
expected_all_values: Sequence
expected_single_value: Sequence
expected_total_len: int

def __str__(self):
return '{name}'.format(name=self.pairs)


ALL_DATA = _P(
pairs=[('a', 1), ('a', 2), ('a', 3), ('a', [4, 5, 6]), ('b', 0), ('c', {'x': 'z', 'r': 's'})],
expected_all_values=[('a', [1, 2, 3, [4, 5, 6]]), ('b', [0]), ('c', [{'x': 'z', 'r': 's'}])],
expected_single_value=[('a', [4, 5, 6]), ('b', 0), ('c', {'x': 'z', 'r': 's'})],
expected_total_len=6
)


def _create_multimap_with_items(pairs):
m = Multimap()
for pair in pairs:
m.add_item(pair[0], pair[1])
return m


@parametrize(
ALL_DATA
)
def test_add_item(p):
m = _create_multimap_with_items(p.pairs)
for expected in p.expected_all_values:
assert list([x for x in m.get_all_values(expected[0])]) == expected[1]
for expected in p.expected_single_value:
assert m[expected[0]] == expected[1]
assert p.expected_total_len == len(m)


@parametrize(
(ALL_DATA, ["a"], 2),
(ALL_DATA, ["b"], 5),
(ALL_DATA, ["c"], 5),
(ALL_DATA, ["a", "b"], 1),
(ALL_DATA, ["a", "b", "c"], 0)
)
def test_delete_item(item):
m = _create_multimap_with_items(item[0].pairs)
p, items_to_remove, len_after_removal = item
for to_remove in items_to_remove:
del m[to_remove]
assert len(m) == item[2]


@parametrize(
{},
{"a": 1},
{"a": 1, "b": 2, "c": [1, 2, {3: 4}]}
)
def test_constructor(d):
m = Multimap(None, d)
for k, v in iter(d.items()):
assert m[k] == v
assert len(m) == len(d)
10 changes: 5 additions & 5 deletions tests/test_simpleion.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
from amazon.ion.core import IonType, IonEvent, IonEventType, OffsetTZInfo, Multimap
from amazon.ion.simple_types import IonPyDict, IonPyText, IonPyList, IonPyNull, IonPyBool, IonPyInt, IonPyFloat, \
IonPyDecimal, IonPyTimestamp, IonPyBytes, IonPySymbol, _IonNature
from amazon.ion.equivalence import ion_equals, isinstance_of_ion_nature
from amazon.ion.equivalence import ion_equals, isinstance_of_ion_nature_or_new_ion_py_objects
from amazon.ion.simpleion import dump, dumps, load, loads, _ion_type, _FROM_ION_TYPE, _FROM_TYPE_TUPLE_AS_SEXP, \
_FROM_TYPE
from amazon.ion.writer_binary_raw import _serialize_symbol, _write_length
Expand Down Expand Up @@ -272,7 +272,7 @@ def generate_annotated_values_binary(scalars_map, container_map):
for value_p in chain(generate_scalars_binary(scalars_map, preceding_symbols=2),
generate_containers_binary(container_map, preceding_symbols=2)):
obj = value_p.obj
if not isinstance_of_ion_nature(obj):
if not isinstance_of_ion_nature_or_new_ion_py_objects(obj):
continue
obj.ion_annotations = (_st(u'annot1'), _st(u'annot2'),)
annot_length = 2 # 10 and 11 each fit in one VarUInt byte
Expand Down Expand Up @@ -441,7 +441,7 @@ def generate_annotated_values_text(scalars_map, container_map):
for value_p in chain(generate_scalars_text(scalars_map),
generate_containers_text(container_map)):
obj = value_p.obj
if not isinstance_of_ion_nature(obj):
if not isinstance_of_ion_nature_or_new_ion_py_objects(obj):
continue
obj.ion_annotations = (_st(u'annot1'), _st(u'annot2'),)

Expand Down Expand Up @@ -571,7 +571,7 @@ def _to_obj(to_type=None, annotations=(), tuple_as_sexp=False):
for obj in roundtrips:
obj = _adjust_sids()
yield obj, is_binary, indent, False
if not isinstance_of_ion_nature(obj):
if not isinstance_of_ion_nature_or_new_ion_py_objects(obj):
ion_type = _ion_type(obj, _FROM_TYPE)
yield _to_obj()
else:
Expand All @@ -592,7 +592,7 @@ def _assert_roundtrip(before, after, tuple_as_sexp):
# wraps each input value in _IonNature for comparison against the output.
def _to_ion_nature(obj):
out = obj
if not isinstance_of_ion_nature(out):
if not isinstance_of_ion_nature_or_new_ion_py_objects(out):
from_type = _FROM_TYPE_TUPLE_AS_SEXP if tuple_as_sexp else _FROM_TYPE
ion_type = _ion_type(out, from_type)
out = _FROM_ION_TYPE[ion_type].from_value(ion_type, out)
Expand Down