-
-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading huge (21GB) Oasis file #280
Comments
Have you tried doing it with klayout instead? I find that klayout typically behaves better in situations like this |
Thanks for the suggestion, but unfortunately our app is closed source, so the KLayout copyleft license means I can't use it for our app |
This is something that can be added, but it will require some work. Basically, we need to add a filter list in |
I will probably do this. However, I am not sure I understand how some of the modal variables work. I thought that the point of the modal variables was that if we read a record and it doesn't include some of the attributes, they can be re-used from the previous record by using the modal variable again. So for example if you have a bunch of polygons with mostly the same attributes you could read them once and re-use them over and over. However, this doesn't seem to be the case for all of the modal variables. For example modal_repetition, we only copy this repetition to the current record being read if the record contained a repetition. I don't see anywhere that the code re-uses the modal_repetition that was read by a previous record. If I look at modal_textlayer and modal_texttype, on the other hand, these variables get use to set the tag of a label even if the label didn't include a layer or type. It looks like I could have 1 label record that included a layer and data type, and then if every label after that didn't include any layer or datatype, it would just re-use the previous modal values over and over. This is different than the modal_repetition, I can't see what the purpose of modal_repetition is since it only gets copied if the current record includes a repetition. Another example is the modal_polygon_points, it looks like if I had 1 polygon with a point list, then a bunch with no point list, it will just re-use the modal_polygon_points for them all So I guess I don't really understand the inconsistency in the way the modal variables are used. Then if I look at modal_layer and modal_datatype, those are shared between polygons, paths, trapezoids, circles, XGEOMETRY. Is that how it's supposed to work? But labels don't share the same modal layer or modal datatype as other shapes? Just want to make sure I understand how it's supposed to work before I go off making code changes |
OK so I was missing something about how modal_repetition works before, sorry. |
Hello @walkerstop, if possible, could you create a pull request on your implementation? I face a similar problem. Thanks in advance |
I created the pull request with that and also .gds.gz support: But I need help testing it I only tested these features in my closed-source app which requires a lot of other code changes. I know it works there. |
Hi, I was wondering if anyone has had success reading very large .oas files.
I am using the C++ gdstk library, built and running on a SuSE Linux server that has 512GB of RAM.
My .oas file is 21GB.
The call to gdstk::read_oas() has been running for 2 days and so far has consumed 110GB of RAM but still has not finished the call to read_oas()
Just wondering if anyone had any ideas to try. I am also running some performance profiling to see where the bottleneck is but haven't looked into the results yet.
I believe (based on limited information so far) that a lot of the CPU cycles are being spent in calls to calloc() coming from allocate_clear()
In my case I actually only need to read a few of the layer/data types (tags) in the file.
I have already added shape_tags and label_tags filters (like gdstk::read_gds() already has) and so I am throwing away MOST of the content of this file.
However, the way that I implemented the shape_tags and label_tags was kind of stupid and is probably not helping the performance. I let read_oas() allocate the structures and read the elements, and only once I know what the tag is, then if it's not a tag I want, then I free the structures and don't add them to the library.
I know this is wasting a lot of cycles allocating and then immediately freeing memory.
I was thinking it might improve the performance if I can read elements into temporary structures on the stack until I read the tag, and then only allocate and add them to the library if I need to keep them, but I haven't tried this yet, it's not super straightforward to me how to do this due to some of the modal pointers, and due to the space needed to read elements not always being the same.
Anyway, if anyone has any ideas, I'd love to hear them.
If I do manage to support shape_tags and label_tags in read_oas() in a smarter way, I would be happy to open a pull request in case the change is useful to others, but I have not gotten that far.
Something I have tried already is using larger read buffers instead of my system's default 8KB buffers. That didn't seem to help much.
Another idea I have not tried yet would be using mmap() instead of fread(), but based on what I've read so far, it's not clear to me whether this would be much faster or not. I kind of doubt that the bottleneck is fread() anyway, but I should know more once I've done more profiling.
The text was updated successfully, but these errors were encountered: