Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify artifact link [Non Backward Compatible!] #1494

Merged
merged 3 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 15 additions & 15 deletions docs/docs/saving_and_loading_from_catalog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ It's also possible to add artifacts to the library's default catalog:
Using Catalog Assets
--------------------

To use catalog objects, simply specify their name in the Unitxt object that will use them.
To use catalog objects, simply specify their name in the Unitxt object that will use them.

.. code-block:: python

Expand All @@ -56,8 +56,8 @@ To use catalog objects, simply specify their name in the Unitxt object that will
Modifying Catalog Assets on the Fly
-----------------------------------

To modify a catalog asset's fields dynamically, upon fetching the asset from the catalog, use the syntax: ``artifact_name[key_to_modify=new_value]``.
To assign lists, use: ``artifact_name[key_to_modify=[new_value_0, new_value_1]]``.
To modify a catalog asset's fields dynamically, upon fetching the asset from the catalog, use the syntax: ``artifact_name[key_to_modify=new_value]``.
To assign lists, use: ``artifact_name[key_to_modify=[new_value_0, new_value_1]]``.
To assign dictionaries, use: ``artifact_name[key_to_modify={new_key_0=new_value_0,new_key_1=new_value_1}]``.
Note that the whole new value of the field has to be specified; not just one item of a list, or one key of the dictionary.
For instance, to change the metric specification of a task:
Expand Down Expand Up @@ -85,20 +85,20 @@ Use ``get_from_catalog`` to directly access catalog assets, and obtain an asset
A Catalog Asset Linking to Another Catalog Asset
------------------------------------------------

A catalog asset can be just a link to another asset.
This feature comes handy when for some reason, we want to change the catalog name
of an existing asset (e.g. ``asset1`` to ``asset2``), while there is already code
A catalog asset can be just a link to another asset.
This feature comes handy when for some reason, we want to change the catalog name
of an existing asset (e.g. ``asset1`` to ``asset2``), while there is already code
that uses the old name of the asset and we want to avoid non-backward compatible changes.

In such a case, we can save the asset as ``asset2``, create an asset of type
In such a case, we can save the asset as ``asset2``, create an asset of type
:class:`ArtifactLink <unitxt.artifact.ArtifactLink>` that links to ``asset2``, and save
that one as ``asset1``.
When ``asset1`` is accessed from an existing code, Unixt Catalog realizes that the asset fetched from position ``asset1``
is an ``ArtifactLink``, so it continues and fetches ``asset2`` -- the Artifact linked to by ``asset1``.
When ``asset1`` is accessed from an existing code, Unixt Catalog realizes that the asset fetched from position ``asset1``
is an ``ArtifactLink``, so it continues and fetches ``asset2`` -- the Artifact linked to by ``asset1``.

.. code-block:: python

link_to_asset2 = ArtifactLink(artifact_linked_to="asset2")
link_to_asset2 = ArtifactLink(to="asset2")
add_to_catalog(
link_to_asset2,
"asset1",
Expand All @@ -109,8 +109,8 @@ Deprecated Asset
----------------

Every asset has a special field named ``__deprecated_msg__`` of type ``str``, whose default value is None.
When None, the asset is cocnsidered non-deprecated. When not None, the asset is considered deprecated, and
its ``__deprecated_msg__`` is logged at level WARN upon its instantiation. (Other than this logging,
When None, the asset is cocnsidered non-deprecated. When not None, the asset is considered deprecated, and
its ``__deprecated_msg__`` is logged at level WARN upon its instantiation. (Other than this logging,
the artifact is instantiated normally.)

Example of a deprecated catalog asset:
Expand All @@ -123,12 +123,12 @@ Example of a deprecated catalog asset:
"text": "You are an agent in charge of answering a boolean (yes/no) question. The system presents you with a passage and a question. Read the passage carefully, and then answer yes or no. Think about your answer, and make sure it makes sense. Do not explain the answer. Only say yes or no."
}

Combining this feature with ``ArtifactLink`` in the above example, we can also log a warning to the accessing code that
the name ``asset1`` is to be replaced by ``asset2``.
Combining this feature with ``ArtifactLink`` in the above example, we can also log a warning to the accessing code that
the name ``asset1`` is to be replaced by ``asset2``.

.. code-block:: python

link_to_asset2 = ArtifactLink(artifact_linked_to="asset2",
link_to_asset2 = ArtifactLink(to="asset2",
__deprecated_msg__="'asset1' is going to be deprecated. In future uses, please access 'asset2' instead.")
add_to_catalog(
link_to_asset2,
Expand Down
65 changes: 11 additions & 54 deletions src/unitxt/artifact.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,10 +282,12 @@ def from_dict(cls, d, overwrite_args=None):
@classmethod
def load(cls, path, artifact_identifier=None, overwrite_args=None):
d = artifacts_json_cache(path)
if "artifact_linked_to" in d and d["artifact_linked_to"] is not None:
# d stands for an ArtifactLink
artifact_link = ArtifactLink.from_dict(d)
return artifact_link.load(overwrite_args)
if "__type__" in d and d["__type__"] == "artifact_link":
cls.from_dict(d) # for verifications and warnings
catalog, artifact_rep, _ = get_catalog_name_and_args(name=d["to"])
return catalog.get_with_overwrite(
artifact_rep, overwrite_args=overwrite_args
)

new_artifact = cls.from_dict(d, overwrite_args=overwrite_args)
new_artifact.__id__ = artifact_identifier
Expand Down Expand Up @@ -466,54 +468,11 @@ def __repr__(self):


class ArtifactLink(Artifact):
# the artifact linked to, expressed by its catalog id
artifact_linked_to: str = Field(default=None, required=True)
to: Artifact

@classmethod
def from_dict(cls, d: dict):
assert isinstance(d, dict), f"argument must be a dictionary, got: d = {d}."
assert (
"artifact_linked_to" in d and d["artifact_linked_to"] is not None
), f"A non-none field named 'artifact_linked_to' is expected in input argument d, but got: {d}."
artifact_linked_to = d["artifact_linked_to"]
# artifact_linked_to is a name of catalog entry
assert isinstance(
artifact_linked_to, str
), f"'artifact_linked_to' should be a string expressing a name of a catalog entry. Got{artifact_linked_to}."
msg = d["__deprecated_msg__"] if "__deprecated_msg__" in d else None
return ArtifactLink(
artifact_linked_to=artifact_linked_to, __deprecated_msg__=msg
)

def load(self, overwrite_args: dict) -> Artifact:
# identify the catalog for the artifact_linked_to
assert (
self.artifact_linked_to is not None
), "'artifact_linked_to' must be non-None in order to load it from the catalog. Currently, it is None."
assert isinstance(
self.artifact_linked_to, str
), f"'artifact_linked_to' should be a string (expressing a name of a catalog entry). Currently, its type is: {type(self.artifact_linked_to)}."
needed_catalog = None
catalogs = list(Catalogs())
for catalog in catalogs:
if self.artifact_linked_to in catalog:
needed_catalog = catalog

if needed_catalog is None:
raise UnitxtArtifactNotFoundError(self.artifact_linked_to, catalogs)

path = needed_catalog.path(self.artifact_linked_to)
d = artifacts_json_cache(path)
# if needed, follow, in a recursive manner, over multiple links,
# passing through instantiating of the ArtifactLink-s on the way, triggering
# deprecatioin warning as needed.
if "artifact_linked_to" in d and d["artifact_linked_to"] is not None:
# d stands for an ArtifactLink
artifact_link = ArtifactLink.from_dict(d)
return artifact_link.load(overwrite_args)
new_artifact = Artifact.from_dict(d, overwrite_args=overwrite_args)
new_artifact.__id__ = self.artifact_linked_to
return new_artifact
def verify(self):
if self.to.__id__ is None:
raise UnitxtError("ArtifactLink must link to existing catalog entry.")


def get_raw(obj):
Expand Down Expand Up @@ -577,14 +536,12 @@ def fetch_artifact(artifact_rep) -> Tuple[Artifact, Union[AbstractCatalog, None]
"""
if isinstance(artifact_rep, Artifact):
if isinstance(artifact_rep, ArtifactLink):
return fetch_artifact(artifact_rep.artifact_linked_to)
return fetch_artifact(artifact_rep.to)
return artifact_rep, None

# If local file
if isinstance(artifact_rep, str) and Artifact.is_artifact_file(artifact_rep):
artifact_to_return = Artifact.load(artifact_rep)
if isinstance(artifact_rep, ArtifactLink):
artifact_to_return = fetch_artifact(artifact_to_return.artifact_linked_to)

return artifact_to_return, None

Expand Down
2 changes: 1 addition & 1 deletion src/unitxt/catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ def add_link_to_catalog(
deprecated_msg = None

artifact_link = ArtifactLink(
artifact_linked_to=artifact_linked_to, __deprecated_msg__=deprecated_msg
to=artifact_linked_to, __deprecated_msg__=deprecated_msg
)

add_to_catalog(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"__type__": "artifact_link",
"artifact_linked_to": "augmentors.text.whitespace_prefix_suffix",
"to": "augmentors.text.whitespace_prefix_suffix",
"__deprecated_msg__": "Artifact 'augmentors.augment_whitespace_prefix_and_suffix_task_input' is deprecated. Artifact 'augmentors.text.whitespace_prefix_suffix' will be instantiated instead. In future uses, please reference artifact 'augmentors.text.whitespace_prefix_suffix' directly."
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"__type__": "artifact_link",
"artifact_linked_to": "augmentors.text.whitespace_prefix_suffix",
"to": "augmentors.text.whitespace_prefix_suffix",
"__deprecated_msg__": "Artifact 'augmentors.augment_whitespace_task_input' is deprecated. Artifact 'augmentors.text.whitespace_prefix_suffix' will be instantiated instead. In future uses, please reference artifact 'augmentors.text.whitespace_prefix_suffix' directly."
}
2 changes: 1 addition & 1 deletion src/unitxt/catalog/tasks/qa/with_context/abstractive.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"__type__": "artifact_link",
"artifact_linked_to": "tasks.qa.with_context",
"to": "tasks.qa.with_context",
"__deprecated_msg__": null
}
2 changes: 1 addition & 1 deletion src/unitxt/catalog/tasks/qa/with_context/extractive.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"__type__": "artifact_link",
"artifact_linked_to": "tasks.qa.extractive",
"to": "tasks.qa.extractive",
"__deprecated_msg__": null
}
4 changes: 2 additions & 2 deletions tests/library/test_artifact.py
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ def test_artifact_link_with_deprecation_warning(self):

with self.assertWarns(DeprecationWarning):
rename_fields = ArtifactLink(
artifact_linked_to="rename.for.test.artifact.link",
to="rename.for.test.artifact.link",
__deprecated_msg__="Artifact is deprecated. "
"'rename.for.test.artifact.link' is now instantiated instead. "
"\nIn the future, please use 'rename.for.test.artifact.link'.",
Expand Down Expand Up @@ -536,7 +536,7 @@ def test_artifact_link_in_recursive_load(self):
)

link_to_copy_operator = ArtifactLink(
artifact_linked_to="copy.operator",
to="copy.operator",
)
add_to_catalog(
link_to_copy_operator,
Expand Down
Loading