Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiops #295

Merged
merged 7 commits into from
Jan 4, 2024
Merged

Multiops #295

merged 7 commits into from
Jan 4, 2024

Conversation

jreadey
Copy link
Member

@jreadey jreadey commented Dec 18, 2023

Add ability to get multiple attributes via PUT attributes or write multiple attributes with POST attributes.

@jreadey jreadey requested a review from mattjala December 18, 2023 17:00
new_attribute = False # set this if we have any new attributes
for attr_name in items:
attribute = items[attr_name]
if attr_name in attributes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attributes should have a name that makes it clear that it's a list of attributes that already exist/are already in storage. Maybe retrieved_attributes? This would also be good for attr_dict in POST_Attributes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attributes are the current set of attributes, so I think I'll keep it the way it is. Realize there's potential for confusion in other methods - will review.

hsds/attr_sn.py Outdated Show resolved Hide resolved
hsds/attr_sn.py Outdated Show resolved Hide resolved
self._objs = None


async def get_attributes(self, obj_id, attr_names):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be merged with _get_attributes from attr_sn? The domain crawler self seems like it contains the app and parameter fields needed for _get_attributes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Created a getAttributes function in servicenode_lib.py that both attr_sn and DomainCrawl call.

hsds/domain_crawl.py Outdated Show resolved Hide resolved
hsds/domain_crawl.py Outdated Show resolved Hide resolved
tests/integ/attr_test.py Outdated Show resolved Hide resolved
hsds/attr_sn.py Outdated Show resolved Hide resolved
hsds/attr_sn.py Outdated Show resolved Hide resolved
@mattjala
Copy link
Contributor

Here is a list of cases where either HSDS or the client can attempt to use an overlong URL due to attributes or links having long names:

Client request is safe, but can cause HSDS to use an overlong URL

  • POST_Dataset, dset_sn.py:1182 -> PUT_Link
  • POST_Datatype, ctype_sn.py:238 -> PUT_Link
  • POST_Group, group_sn.py:238 -> PUT_Link

Both client request and resultant HSDS request can use an overlong URL

  • PUT_Link, link_sn.py:286 -> PUT_Link
  • GET_Link, link_sn.py:158 -> GET_Link
  • DELETE_Link, link_sn.py:363 -> DELETE_Link
  • GET_Attribute, attr_sn.py:183 -> GET_Attribute
  • GET_AttributeValue, attr_sn.py:514, -> GET_Attribute
  • PUT_Attribute, attr_sn.py:390 -> PUT_Attribute
  • DELETE_Attribute, attr_sn.py:451, -> DELETE_Attribute
  • PUT_AttributeValue, attr_sn.py:642, attr_sn.py:750 -> GET_Attribute, PUT_Attribute

These API endpoints expect a potentially overlong h5path query string in the URL, and then make a potentially overlong GET_Link request in getObjectIdByPath

  • GET_Domain
  • GET_Datatype
  • GET_Group
  • GET_Dataset

POST_Attributes and PUT_Attributes as implemented in this PR would provide a safe alternative to the GET_Attribute/GET_AttributeValue/PUT_Attribute/PUT_AttributeValue API endpoints by storing long attribute names in the request body.

The HDF5 Library doesn't formally specify an attribute name limit, and only attribute names up to 255 bytes are tested. For this reason, I think we can safely consider attribute names that cause a URL to exceed aiohttp's default limit (8190 bytes) to be unsupported. In that case, we wouldn't need to worry about changing DELETE_Attribute.

The changes that would need to be made to avoid all other potentially overlong URLs:

  • PUT_Link should take link name in the request body instead of the query string.
  • DELETE_Link would need to be implemented under POST_Link or PUT_Link.
  • GET_* would need to be implemented under POST_* or PUT_* for links, groups, datasets, datatypes, and domains.

The latter two changes don't seem like perfect ideas - moving all the GET functionality under POST/PUT could be be unintuitive or duplicate behavior. Some other possible solutions to handle the long URL cases in GET_*/DELETE_* endpoints:

  • Increase the max URL length accepted by HSDS through aiohttp, and hope that intermediaries between HSDS and the client accept a URL up to ~67 Kb.
  • Do not support link names above ~2048 bytes (generally considered an upper bound on 'reasonable' URL lengths). For reference, the library supports link names up to 65 Kb.
  • Allow GET and DELETE requests to provide a body containing the target link name/target h5path. This is probably bad practice, and also requires the user to change how they provide the link name based on how long it is. If we require the link name to always be provided in the body, then that's a breaking API change, but the API going forward would be more consistent.

):
""" get the requested set of attributes from the given object """
if attr_names is None:
# TBD - fetch all attributes is this is None?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should fetch all attributes to match the commented out test in attr_test.py

@@ -1703,7 +1704,14 @@ def testPostAttributeMultiple(self):
self.assertEqual(len(root_attrs), 2)
dset_attrs = attributes[dset_id]
self.assertEqual(len(dset_attrs), 1) # one of the ones we asked for didn't exist


# test with no providing all attribute names - should return all attributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be enabled or removed

for grp_id in grp_ids:
for i in range(grp_count):
grp_id = grp_ids[i]
print(f"grp_id: {grp_id}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove print statement

mattjala
mattjala previously approved these changes Dec 20, 2023
domain=self.base_domain, username="test_user2"
)
headers = helper.getRequestHeaders(domain=self.base_domain, username="test_user2")
attr_payload = {"type": "H5T_STD_I32LE", "value": 42}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails due to attr_payload being undefined if user2_name isn't found in the config - this should either have a default second username, or the requests should only be made if a second username is found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

hsds/attr_dn.py Outdated
else:
encoding = None

if "seperator" in params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seperator -> separator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

hsds/attr_sn.py Outdated
else:
encoding = None

if "seperator" in params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seperator -> separator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed



async def deleteAttributes(app, obj_id, attr_names=None, seperator="/", bucket=None):
""" get the requested set of attributes from the given object """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description should be delete the requested set of attributes from the given object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

attr_names_param = base64.b64encode(attr_name.encode("utf8")).decode("ascii")
# specify a seperator since our attribute name has the default slash
params = {"attr_names": attr_names_param, "encoding": "base64", "seperator": "!"}
rsp = self.session.delete(req, params=params, headers=headers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include a test that deletes multiple attributes at once through DELETE_Attributes. I wrote the following while testing the API:

    def testDeleteAttributesMultiple(self):
        print("testDeleteAttributesMultiple", self.base_domain)
        headers = helper.getRequestHeaders(domain=self.base_domain)
        req = self.endpoint + "/"

        # Get root uuid
        rsp = self.session.get(req, headers=headers)
        self.assertEqual(rsp.status_code, 200)
        rspJson = json.loads(rsp.text)
        root_uuid = rspJson["root"]
        helper.validateId(root_uuid)

        # create a subgroup
        req = self.endpoint + "/groups"
        rsp = self.session.post(req, headers=headers)
        self.assertEqual(rsp.status_code, 201)
        rspJson = json.loads(rsp.text)
        grp_id = rspJson["id"]
        self.assertTrue(helper.validateId(grp_id))

        # link as "grp1"
        grp_name = "grp1"
        req = self.endpoint + "/groups/" + root_uuid + "/links/" + grp_name
        payload = {"id": grp_id}
        rsp = self.session.put(req, data=json.dumps(payload), headers=headers)
        self.assertEqual(rsp.status_code, 201)  # created

        # Create attributes
        for i in range(10):
            attr_name = "attr" + str(i * 100)
            value = [i]
            data = {"type": "H5T_IEEE_F32LE", "shape": 1, "value": value}
            req = self.endpoint + "/groups/" + grp_id + "/attributes/" + attr_name
            rsp = self.session.put(req, data=json.dumps(data), headers=headers)
            self.assertEqual(rsp.status_code, 201) # Created

        # Delete all by parameter
        attr_names = ["attr" + str(i * 100) for i in range(10)]
        separator = '/'

        req = self.endpoint + "/groups/" + grp_id + "/attributes?attr_names=" + separator.join(attr_names)
        rsp = self.session.delete(req, headers=headers)
        self.assertEqual(rsp.status_code, 200)

        # Attempt to read deleted attributes
        for i in range(10):
            req = self.endpoint + "/groups/" + grp_id + "/attributes/" + "attr" + str(i)
            rsp = self.session.get(req, headers=headers)
            self.assertEqual(rsp.status_code, 404)

        # Create another batch of attributes
        for i in range(10):
            attr_name = "attr" + str(i * 100)
            value = [i]
            data = {"type": "H5T_IEEE_F32LE", "shape": 1, "value": value}
            req = self.endpoint + "/groups/" + grp_id + "/attributes/" + attr_name
            rsp = self.session.put(req, data=json.dumps(data), headers=headers)
            self.assertEqual(rsp.status_code, 201) # Created

        # Delete with custom separator
        attr_names = ["attr" + str(i * 100) for i in range(10)]
        separator = ':'

        req = self.endpoint + "/groups/" + grp_id + "/attributes?attr_names=" + separator.join(attr_names) + "&seperator=" + separator
        rsp = self.session.delete(req, headers=headers)
        self.assertEqual(rsp.status_code, 200)

        # Attempt to read
        for i in range(10):
            req = self.endpoint + "/groups/" + grp_id + "/attributes/" + "attr" + str(i)
            rsp = self.session.get(req, headers=headers)
            self.assertEqual(rsp.status_code, 404)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Added test

@mattjala
Copy link
Contributor

mattjala commented Jan 2, 2024

At this point, there are four different ways to "get the value of an attribute" - GET_Attribute, GET_Attributes, GET_AttributeValue, and POST_Attributes, and two different ways to get the values of multiple attributes. Is there a motive for each of these different methods to continue to exist beyond backwards compatibility?

@jreadey jreadey merged commit 45f8644 into master Jan 4, 2024
9 of 10 checks passed
@jreadey jreadey deleted the multiops branch April 29, 2024 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants