Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OASIS missing paths after saving #277

Open
fnJeff opened this issue Nov 20, 2024 · 14 comments
Open

OASIS missing paths after saving #277

fnJeff opened this issue Nov 20, 2024 · 14 comments

Comments

@fnJeff
Copy link

fnJeff commented Nov 20, 2024

I believe this is related to #268, but I've tried to create something you can actually use to debug the problem. I only realized this was happening when I started getting DRC errors from a flow that has been working correctly for years. The only difference was that I started using GDSTK to convert GDS style properties to OASIS style properties. I say missing 'path' because that is specifically the object that is missing, but in reality is seems to be a single rectangle that is missing, not an entire path, but that is difficult to say without doing a full XOR of the layouts.

I have attached an OASIS file to debug with. I've obfuscated any important information and I only wrote out the metal layer (34) that showed the problem. I hope that you can uncover the general problem from this, because losing any shape at all by using GSDTK is obviously very dangerous in a production environment.
test.oasis.gz

Using the following code, I hope you can easily reproduce the problem. Please ignore the missing reference cell warnings as those are intentionally not included and shouldn't have any effect on the issue.

import gdstk

lib_good  = gdstk.read_oas("test.oasis")

cell_good = lib_good.cells[0]

lib_good.write_oas("test.saved.oasis")

lib_fail  = gdstk.read_oas("test.saved.oasis")

cell_fail = lib_fail.cells[0]

print(f"Good: References: {len(cell_good.references)}")
print(f"Good: Paths     : {len(cell_good.paths)}")
print(f"Good: Polygons  : {len(cell_good.polygons)}")

print(f"Fail: References: {len(cell_fail.references)}")
print(f"Fail: Paths     : {len(cell_fail.paths)}")
print(f"Fail: Polygons  : {len(cell_fail.polygons)}")

When I execute the above code this is the output I'm seeing:

Good: References: 416
Good: Paths     : 449
Good: Polygons  : 935
Fail: References: 416
Fail: Paths     : 448
Fail: Polygons  : 935

Just the path count shows that just saving and reading back in the OASIS has lost something. I did look a little deeper and in my real case I found that the 2nd to last path was missing. In this testcase, you can see that the 3rd to last is missing. So something about my testcase changed which path was getting lost in the saving process.

I used the following (amateur) code to print out all paths that were mismatched based on the GDS property 42 containing the name of the associated net.

for obj in cell_good.paths:
    index = cell_good.paths.index(obj)

    name_good = obj.get_gds_property(42)

    name_fail = '---'
    if index < len(cell_fail.paths):
        name_fail = cell_fail.paths[index].get_gds_property(42)

    if name_good != name_fail:
        print(f"{index:5} : {name_good:40} : {name_fail:40}")

The output from this, in this testcase, shows that the 3rd from the last path seems to be missing from the saved OASIS that was read back in.

  446 : gclk_FC_cg_rxpi_col_reg_0_               : sync_oneshot_phy_strobe/sig_clk         
  447 : sync_oneshot_phy_strobe/sig_clk          : sync_oneshot_phy_strobe/oneshot/sig_d1  
  448 : sync_oneshot_phy_strobe/oneshot/sig_d1   : ---                                     
@fnJeff
Copy link
Author

fnJeff commented Nov 25, 2024

Looks like @jatoben found and fixed the cause of this. Thanks!!!

I hate to push, but I have to ask... any chance this is releasable as a 0.9.58 version soon-ish? My implementation solution using gdstk is currently unusable because of these missing path shapes.

@heitzmann
Copy link
Owner

I was going to do it this weekend, but had no time to it. I'm starting the build in a few minutes!

@fnJeff
Copy link
Author

fnJeff commented Nov 26, 2024

Thanks for the release!

@jatoben Unfortunately, I hate to say it, but I don't think this fixed the issue I reported unless I'm missing something. At least I'm still seeing the exact same problem after updating to 0.9.58 (using python 3.13.0). I won't repost it since all output is identical to my original post.

Not sure you want to repoen this issue, or if I should create a new one.

1111-A0:> pip show gdstk
Name: gdstk
Version: 0.9.58
Summary: Python module for creation and manipulation of GDSII files.
Home-page: 
Author: 
Author-email: "Lucas H. Gabrielli" <[email protected]>
License: Boost Software License - Version 1.0 - August 17th, 2003
1111-A0:> python
Python 3.13.0 (main, Nov  3 2024, 23:53:35) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

@fnJeff
Copy link
Author

fnJeff commented Nov 26, 2024

@heitzmann Please let me know if you want to reopen this issue or if I should create a new one. I have no idea what normal protocol is for this case.

Since I'm sort of stuck until I have this working, I'm doing my best to dig into the issue despite having almost zero cpp experience. Basically, I'm just looking for anything that might help you debug on your end.

The below experiment proves that whatever is happening, is happening when writing the OASIS, and that it isn't being read from the original OASIS this way. On top of that, the original paths are modified in the orignal cell object, so any further debugging would be looking at the modified paths.

import gdstk

lib  = gdstk.read_oas("test.oasis")
cell = lib.cells[0]
paths = [obj for obj in cell.paths if obj.get_gds_property(42) == 'gclk_FC_cg_rxpi_col_reg_0_']

print([len(obj.spine()) for obj in paths])

lib.write_oas("test.saved.oasis")

print([len(obj.spine()) for obj in paths])

Output from experiment:

[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
<python-input-0>:11: RuntimeWarning: Empty path.
  lib.write_oas("test.saved.oasis")
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2]

From this you can see the new warning about an empty path, another confirmation that I have the latest release with fix from @jatoben. The problem is that it isn't an empty path. In the list of paths for this net the one with the modified number of spines reported is 37.

Reloading the original OASIS to look closely at path 37, you can see that this is a non-empty horizontal path of length 1e-9, which is the default tolerance length. This is exactly what @jatoben fixed and why I expected it to address this issue. But after the write, you can see that the spine() returns only a single point now, while path_spines() returns the orignal 2 points.

>>> paths[37].spine()
array([[3.279, 5.584],
       [3.28 , 5.584]])
>>> paths[37].path_spines()
[array([[3.279, 5.584],
       [3.28 , 5.584]])]

>>> lib.write_oas("test.saved.oasis")
<python-input-3>:1: RuntimeWarning: Empty path.
  lib.write_oas("test.saved.oasis")

>>> paths[37].spine()
array([[3.279, 5.584]])
>>> paths[37].path_spines()
[array([[3.279, 5.584],
       [3.28 , 5.584]])]

So while @jatoben's fix definitely addressed paths with a length exactly equal to the tolerance, it seems that maybe that check is hiding somewhere else as well?

@fnJeff
Copy link
Author

fnJeff commented Nov 26, 2024

I can cause the same warning and missing path by just running to_polygons() on the path. I don't understand exactly why this is being called on any of the paths because they are all simple paths. From what I could find it looked like to_polygons was only called on the FlexPath if it was not a simple path.

>>> [obj.simple_path for obj in paths]
[True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]

>>> paths[37].simple_path
True

Below you can see that executing to_polygons() on the path causes the new warning as well as the destruction of the 2nd spine point.

>>> paths[37].spine()
array([[3.279, 5.584],
       [3.28 , 5.584]])

>>> paths[37].to_polygons()
<python-input-7>:1: RuntimeWarning: Empty path.
  paths[37].to_polygons()
[]

>>> paths[37].spine()
array([[3.279, 5.584]])

@jatoben
Copy link
Collaborator

jatoben commented Nov 27, 2024

@fnJeff Thanks for the additional details! Sadly, the fix I proposed was insufficient for all cases because of error introduced with floating point math. Sorry about that 😢

Using the points you provided as an example:

>>> 3.28 - 3.279
0.0009999999999998899

The result is that the length calculation in remove_overlapping_points() still ends up with a value that's less than the path tolerance.

It would be possible to use an epsilon value in that comparison, but I think the primary thing the function is trying to do is determine whether GDS or OASIS can represent consecutive points. Right now it doesn't know what the output is going to be, or the precision of that output (only the path's own tolerance).

@heitzmann I'm curious to hear if you have any thoughts on the best approach.

@heitzmann
Copy link
Owner

Maybe lower the tolerance of you need points that close together...

@fnJeff
Copy link
Author

fnJeff commented Nov 27, 2024

Maybe lower the tolerance of you need points that close together...

So the problem with that is that I don't set the tolerance, the foundry does. I also do not "need" points that close together, the place and route tool creates shapes with knowledge of the foundry's minimum allowable grid, in this case 1 nm.

So while I understand your reply, tweaking the foundry's real physical limitation of 1 nm to workaround the floating point precision error feel's a bit hacky. And since creating shapes using the actual minimum grid of 1 nm is fully legal, I think we need a cleaner and more general solution. I will see if I can find anything useful today.

Given that the original issue reported wasn't fully solved, can we reopen this issue until we figure out a solution?

@heitzmann heitzmann reopened this Nov 27, 2024
@fnJeff
Copy link
Author

fnJeff commented Nov 27, 2024

Just throwing the idea out there in hopes it gives you guys an idea, but what we really want to do is first do the floating point subtraction and then quantize the result to whatever the precision is before comparing it with the precision. In this case it is 1 nm, but in another technology I will be working in soon I know that uses a grid of 0.5 nm.

If the floating point result is first quantized by the precision before it is compared with it I think that would give us what we are trying to do. Find out of the point is representable by the minimum grid.

@fnJeff
Copy link
Author

fnJeff commented Nov 27, 2024

So my CAD tools typically specify something they call "technology scale factor" which is usually relative to um. So a grid of 1 nm would have a tech scale factor of 1000 while 0.5 nm would be 2000. This tech scale factor is obviously just the result of (unit / precision).

So I think the following python code quantizes the result of the floating point subtraction based on the unit and precision values.

scale_factor = unit / precision

if (round(scale_factor * (coordinate_1 - coordinate_2)) / scale_factor) < tolerance: return error_condition

@heitzmann
Copy link
Owner

It might be even better to make this comparison in scaled coordinates, something like

if (round(scale_factor * (coordinate_1 - coordinate_2))) < scale_factor * tolerance: return error_condition

However, the best solution would be to use scaled, integer coordinates internally (which would also solve a couple of other path issues), but that's a large refactor that I have no idea when I'd be able to implement myself.

@fnJeff
Copy link
Author

fnJeff commented Nov 27, 2024

Absolutely agree with that as a nice clean fix, and I completely understand the time constraint issue. I really wish I could help out more with something, but trying to teach myself cpp while doing so probably isn't cost effective. Using scaled integer coordinates instead of double floats might also reduce the memory footprint (EE pretending to know anything about software... 🤣)?

What do you think about patching up this discrepancy in the remove_overlapping function for the short term? Maybe add a parameter to the path class that would enable a special workaround for this? That way I could loop through the paths, check if the workaround parameter exists and is set False, and then set it to True. This way in a future version, you could easily remove the parameter once you have a more global fix.

Just trying to come up with some kind of solution that isn't super kludgy but doesn't lose important data from the OASIS files so I can use it in my tool flow. And I'm soft of running out of time to get it fully working.

@fnJeff
Copy link
Author

fnJeff commented Nov 27, 2024

I also wanted to point out, as shown above in #277 (comment), the remove_overlapping function handles removing a path that doesn't meet tolerance in a sort of weird way. It only removes the second point in the spine and leaves the path_spines() unchanged. I'm not really sure what a cleaner removal would be, but this might be something that you want to think about as well.

@fnJeff
Copy link
Author

fnJeff commented Dec 2, 2024

Realizing that you guys may not be able to solve this before I need to have it working, I came up with a temporary workaround for the meantime. I'm only including here in hopes that it may be helpful for someone else in a similar situation.

The following code searches through all points of all spines of all paths of all cells looking for any that might meet this somewhat specific floating point precision error pattern (less than 1e-8% the tolerance). When met, I adjust the tolerance so that it won't fail when the remove_overlapping_points function is called. It is currently working in my flow. Anyone using this would have to adjust for whatever property type/name/number you are storing the names in, or remove it if names are not available. This part is specific to my implementation.

for cell in lib.cells:
   for path in cell.paths:
      tol   = path.tolerance
      spine = path.spine()

      if len(spine) < 2:
         print(f"Invalid path spine of less than 2 points")
         print(f"   Path name: '{path.get_gds_property(42)}'")
         print(f"   Tolerance: {tol}")
         print(f"   Spine    :")
         print(f"{spine}\n")
         continue

      for ii in range(1, len(spine)):
         dist = np.linalg.norm(spine[ii] - spine[ii - 1])
         error = tol - dist

         if (dist < tol) and (error < (1e-10 * tol)):
            print(f"Path spine contains consecutive points with distance less than path tolerance likely caused by floating point error")
            print(f"   Path name: '{path.get_gds_property(42)}'")
            print(f"   Tolerance: {tol}")
            print(f"   Spine point[{ii-1:3}]: {spine[ii-1]}")
            print(f"   Spine point[{ii:3}]: {spine[ii]}")
            print(f"   Distance: {dist}")
            print(f"   Error   : {error}")
            print(f"Adjusting path tolerance by floating point error so the path isn't removed")
            path.tolerance -= error
            tol = path.tolerance
            print(f"   Adjusted tolerance: {tol}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants