-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from XML::Simple to XML::LibXML or XML::Fast #66
Comments
While XML::Fast may be better than XML::Simple (I don't know if this is the case), it doesn't solve the fundamental problems that make XML::Simple a bad idea -- having to deal with endless edge cases of items that could be arrays or single values, and tags that have both attributes and contents... this will be a problem in any XML-to-hash converter. Using XML::LibXML or something like Mojo::DOM58 or another DOM traversal parser would solve this issue, but would be a much trickier conversion, you have to change how you approach the data. |
+1: XML as a datasctructure is a pain :(. Be sure some ugly code is laying around because of that fact. If you have any suggestions of how you would use XML::LibXML or Mojo::DOM58 in Paws, I'd love to see them. With the branch I've just pushed I was playing around to see if a quick win is possible (I haven't benchmarked it yet, but it feels faster). Responses from S3 ListObjects with lots of elements are slow with XML::Simple, and seem to be faster with XML::Fast. The bad part is XML::Fast has not been maintained since 2010, so maybe it's not very dependable as a Paws core dep. |
I will take a look when I get a chance at how a DOM object might be used, but I don't understand enough of the code using XML responses yet. |
I've just found out about https://metacpan.org/pod/XML::Compile. It could be an option. |
I recommend leaving XML::Simple for security reasons in addition to performance. There is a significant XXE vulnerability that was reported 4 years ago in RT for XML::Simple. There continues to be no fix or work around that I am aware of. Without doing a code review of every use of XML::Simple to parse XML data in Paws, I'm not exactly sure how the existing vulnerability might be exploited. Basically it works like this, though, given an XML file that has a particular kind of doctype entity tag, XML::Simple will read any arbitrary file from the local disk and embed it into the parsed XML document tree. If the program using that document tree at all passes that data on, it can get logged or otherwise passed through to something else to leak information like password files. XML::LibXML::Simple can usually be used as a drop-in replacement for parsing XML data while XML::Simple can still be used for writing XML safely (I don't believe XML::LibXML::Simple can generate XML). |
@zostay Thanks for the heads-up about XML::LibXML::Simple (didn't know it existed, and was drop-in for XML::Simple) |
Hi,
The problem seems to be that Marker, Prefix, etc, etc are empty HashRefs. The XML::LibXML::Simple documentation states that one of the differences with XML::Simple is that empty elements are not removed. https://metacpan.org/pod/XML::LibXML::Simple#empty-elements-are-not-removed. AWS returns:
|
I've started taking a look at how Paws could be converted to use a proper XML parser. The fundamental issue is that when doing it correctly, you do not deserialize the XML to a data structure at all but an object which you then query. Thus the method response_to_object in Paws::API::Caller would have to understand this. It currently appears to assume that unserialize_response will return a data structure. Either Paws::Net::XMLResponse and Paws::Net::RestXMLResponse would have to define a new method which returns the XML object, which Paws::API::Caller would then call and use based on some condition, or it would have to check whether unserialize_response returns an object rather than a basic data structure and handle it appropriately. What are your thoughts on the best way to do this in keeping with the existing structure? Mainly, response_to_object also has to handle JSON responses which deserialize cleanly to plain data structures. If this fundamental issue can be worked out then it makes little difference whether XML::LibXML or Mojo::DOM58 or whatever other parser is used afterward. (And I don't consider XML::LibXML::Simple or XML::Fast or other xml-to-hash converters to be a real solution to this problem.) |
Another kind of crazy idea: put JSON responses in an object too, using JSON pointers to query them. Unfortunately the implementation I'd want to use for that is Mojo::JSON::Pointer and it's a bit silly to depend on Mojolicious just for that. (JSON::Pointer seems super overcomplicated for this task) |
A further problem: response_to_object primarily builds the response object by passing it the plain data structure it receives and relying on the constructor to build the object from there. If deserializing to an object instead of a plain data structure, it wouldn't be able to automatically build it this way; it needs to somehow query the object for each element it's looking for. |
Perhaps this could be solved by a new method for XMLResponse and RestXMLResponse as well: new_from_result_object or so. |
So it basically boils down to this: what would be the most appropriate way for response_to_object to determine that it needs to work with an XML object (from XMLResponse or RestXMLResponse) rather than a plain data structure (from JsonResponse or RestJsonResponse)? |
Take a look at the code if the refactored code in this branc helps https://github.com/pplu/aws-sdk-perl/tree/release/0.37 |
Oh, on that branch each response type defines its own response_to_object, that should make things a lot easier. |
🎉 🎈 |
We did some internal tests, and And all we did was basically update - my $struct = eval { $self->_xml_parser->parse_string($response->content) };
+ my $struct = eval { xml2hash($response->content) }; Would you be interested if we prepared a PR? IIRC, Thanks |
XML::Hash::XS has the exact same design flaws as XML::Simple, and would require checking that edge cases are all handled the same way. |
Sorry that I've failed to complete this effort for so long, I will hopefully find some time to get back to it. |
For sure there could be a more elegant way to handle responses in the long term that can both be efficient and ensure correctness. Thanks for thinking about it and would be great to see what you come up with. My immediate concern is that in the meantime we are leaving a lot of performance on the table by using There is of course the benefit of Thanks. |
In the past I tried changing the XML Parser (It's documented in this ticket) without much luck due to difference of handling edge cases, but at least the test suite catches these things. @slobo @Grinnz : I'd be open to shipping with XML::Hash::XS if the tests pass, as the test suite should be quite thorough with XML responses and their edge cases. Of course, if @Grinnz can build the proper solution I'd be even happier of getting that merged. |
I've just opened the branch https://github.com/pplu/aws-sdk-perl/tree/xml2hash to see how tests look. I've substituted XML::Simple for XML::Hash::XS, and there are test failures. They basically seem related to the fact that Paws configures XML::Simple to undefine empty tags @slobo : please feel free to work on that branch to get everything going. I'm pretty sure that everything will revolve around getting XML::Hash::XS to produce the same hashes as XML::Simple with these options: https://github.com/pplu/aws-sdk-perl/blob/master/lib/Paws/Net/XMLResponse.pm#L10 |
I will see if I have some time I will try it aginst this branch https://github.com/byterock/aws-sdk-perl/tree/s3ObjectTagging where all of the S3 actions are now working. The only problem I found with the XML parsing was the S3 GetBucketLocation where the root node was dropped from the pasing For notes check out these blog posts http://blogs.perl.org/users/byterock/2019/10/paws-iv-the-sun-king.html By the way the 10_response test suite in that branch has test for all of the actions (excelpt a few depricated ones that no longer work) of S3. |
Awesome, I've created #366 to explore this. |
XML::Simple is discouraged, although it works for us. Maybe a faster and better XML parser would be nice
The text was updated successfully, but these errors were encountered: