obspy response recovery is questionable #129
Replies: 4 comments
-
A different point if we dig into this point. The site and channel related code needs to use the schema to remove hard coded names like 'serialized_channel_data'. I am not the one to do that for multiple reasons. |
Beta Was this translation helpful? Give feedback.
-
These inventory and catalog related database methods are created before we had the database API design. Now looking at it, maybe we should consider moving some of the methods into the preprocessing? I don't think we have any tests for this group of methods right now - they are not called in the new database API. |
Beta Was this translation helpful? Give feedback.
-
It is a good idea, I think, to move all those into mspasspy.preprocessing.seed since all are preprocessing steps to build a valid site, channel, and/or source collection. site and channel functions definitely belong in the seed directory. We may want a different directory and module structure for source. Handling source data is a different thing than station data. No matter where the functions are put in the module structure I think we do need to modify them at the same time to use the new schema structure to reduce the hard coded names. |
Beta Was this translation helpful? Give feedback.
-
This is a not exactly the same issue as the title of this discussion section, but is closely enough related I will put it here. In writing the getting started jupyter notebook I discovered a weird feature of obspy. I considered reporting this to the obspy issues page, but it isn't really their problem but I think it is a problem with stationxml that just creates an inconsistency in the abomination they call an Inventory. While I'm being philosophical I want to assert this is a lesson to all readers who write code in modern OOP languages like python: think about data structures carefully and don't make them more complicated than necessary. As I think I mentioned earlier when working on this it became clear to me that Inventory was just a python image of what the FDSN stationxml format can define. The problem is that stationxml has to allow for a large range of complexity that creates the potential for multiple tree structures describing the same data. That is what I am pretty sure is happening here. To get to the point the summary is this: data read from stationxml files stored by obspy's mass downloader and parsed into a different tree structure than data downloaded with web services. For the record, here are some specifics: In my tutorial I used this incantation to call web services directly:
That returns stationxml data for 446 station. You should be able to run the commands and get the same result. If you run this little set of lines:
you should get something like this:
The weirdness is if you read from the set of files like those I pushed to stampede2 for our test data you get something quite different. Here is a read line I used (you should be able to readily adapt this to get a similar result by changing the path to what is appropriate for stampede2):
Now if you run a comparable little iterator loop like this one:
You will get a long list of lines like these:
Why this happens is that the read_inventory function is parsing a large set of files. Each of those files has data for one station and one station only. In contrast, the call to client.get_stations reads gets all the data in one large file. Thus we get two inventory objects with comparable data but stored in a completely different tree structures. I think I can fix this rather quickly. I always thought the structure of what I got back from read_inventory from files was weird and now I know why. If this works as I think it will, we can handle this completely under the hood. I'm writing this long comment to preserve this knowledge, however, because it could come back to bite us sometime in the future if there is some other weird permutation of the stationxml files we haven't seen yet. Also, emphasize this is a lesson in "keep it simple stupid" |
Beta Was this translation helpful? Give feedback.
-
Yet another potential problem revealed in revising the documentation. While discussing the api for handling site and channel I dug into the way we implemented pickling the channel data. Here is the specific block of code that I am uncertain is correct:
What concerns me is not the structure of that code, but what is saved. As I recall, and the code more or less tells me that recollection is correct, we are serializing what obspy calls a "Channel" object/class. It is described here.
What concerns me when I look at that page is there is not method in Channel to retrieve the obspy Response object. Weird because they have a plot method, which obviously uses that data suggesting the class actually does somehow hold the response data and the api just hides it.
I think we need to design a simple test to see if we can retrieve one of the 'serialized_channel_data' attribute, loads the result to restore it, and verify it contains response data in some form. There are thousands of stationxml files in the raw data tutorial that could be used to design the test. This is not a time critical test, but one we do need to do before our initial release because to some parts of our community response data is critical.
Beta Was this translation helpful? Give feedback.
All reactions