Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException during stitching #1124

Open
ElaHobby opened this issue Dec 15, 2024 · 12 comments
Open

NullPointerException during stitching #1124

ElaHobby opened this issue Dec 15, 2024 · 12 comments

Comments

@ElaHobby
Copy link
Contributor

Hi,

I have an issue in the BlockStitcher and would appreciate your help. I am running rapid_compile_ipi, I generated all routed.dcp files based on the PBlock and the flow stops in the BlockStitcher as described below.

The line which throws error (in rapidwright.ipi.BlockStitcher):

    ModuleInst mi = stitched.createModuleInst(modInstName, modImpls.get(implementationIndex));

It starts going through modules (loop for (Entry<String,String> e : ipNames.entrySet()) ) but stops at the first one and throws:

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "com.xilinx.rapidwright.edif.EDIFNetlist.getLibrary(String)" because the return value of "com.xilinx.rapidwright.design.Design.getNetlist()" is null
        at com.xilinx.rapidwright.design.Design.createModuleInst(Unknown Source)
        at com.xilinx.rapidwright.design.Design.createModuleInst(Unknown Source)
        at com.xilinx.rapidwright.ipi.BlockStitcher.main(BlockStitcher.java:541)

The problem is: Device class is closed source, so I can not debug it to figure out what is wrong. Seems like netlist is empty? But isn't this expected due to code line before :" stitched.setNetlist(null); "? If I comment that line, it goes through a few modules without complaining (compared to above where it stops at the first one), but then it throws another error at another module later on...

Exception in thread "main" java.lang.RuntimeException: ERROR: Destination netlist already contains EDIFCell named 'axis_broadcaster_v1_1_32_core' in library 'work'
        at com.xilinx.rapidwright.edif.EDIFNetlist.copyCellAndSubCellsWorker(EDIFNetlist.java:665)
        at com.xilinx.rapidwright.edif.EDIFNetlist.copyCellAndSubCellsWorker(EDIFNetlist.java:656)
        at com.xilinx.rapidwright.edif.EDIFNetlist.copyCellAndSubCellsWorker(EDIFNetlist.java:656)
        at com.xilinx.rapidwright.edif.EDIFNetlist.copyCellAndSubCells(EDIFNetlist.java:626)
        at com.xilinx.rapidwright.design.Design.a(Unknown Source)
        at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1228)
        at com.xilinx.rapidwright.design.Design.addModule(Unknown Source)
        at com.xilinx.rapidwright.design.Design.createModuleInst(Unknown Source)
        at com.xilinx.rapidwright.design.Design.createModuleInst(Unknown Source)
        at com.xilinx.rapidwright.ipi.BlockStitcher.main(BlockStitcher.java:537)

Because the code is closed source, it's difficult to debug it & to be able to say what is wrong here. I am a bit stuck and would appreciate a hint.

@clavin-xlnx
Copy link
Member

Unfortunately, the BlockStitcher flow and rapid_compile_ipi script is not something we have been able to maintain in recent versions. It looks like you are using a rather old version of RapidWright. Would you be willing to update to RapidWright 2024.2.0 and share your input files? If so, we can identify what the issues are and hopefully resolve them.

It looks like there are issues with handling the logical netlist, of which we have had many substantial changes and improvements in recent years so there might just need some corresponding updates in BlockStitcher and other related classes.

@ElaHobby
Copy link
Contributor Author

First of all, thanks for your reply!

I used the latest master version with the latest release. I was curious to see what changed in the past year in the PBlock generator & stitcher. I looked over the pre-implemented flow documentation and I saw the flow was still through the rapid_compile_ipi script. That's why I assumed this is still the current way to go for running the flow with pre-implemented modules.

Sadly, due to confidentiality issue I can not share my current design. But I'll do a simpler one and re-run the flow to reproduce the issue.

@ElaHobby
Copy link
Contributor Author

BlockStitcher was not maintained in latest versions: is there an alternative flow for rapid prototyping with pre-implemented blocks we should use? Or is it just that this part of RW was not maintained in newer versions? I think I have locally a working version from 2021 and then I would use it if it's the later case 😊

@clavin-xlnx
Copy link
Member

rapid_compile_ipi, I generated all routed.dcp files based on the PBlock and the flow stops in the BlockStitcher as described below.

The problem we had was less around the BlockStitcher but more around the interface between IP Integrator and design files coming out of Vivado. Each release of Vivado might change how data is handled, or a version of some IP might change in an example which made it cumbersome to maintain. Module and ModuleInst classes have been maintained and are still used in many ways today. One of the latest demonstrations of this is DynaRapid which composes pre-implemented blocks from a data flow graph generated from Dynamatic (C code).

Since the BlockStitcher was primarily used with the rapid_compile_ipi script and we haven't built many other flows utilizing it, it has not participated in our test suites.

Perhaps if you can share additional details about what you are trying to do, we might be able to help in another way?

@ElaHobby
Copy link
Contributor Author

Thanks for the infos. Indeed, I previously saw the DynaRapid paper (interesting work!)

The project I am using is very large & I'm not sure I am allowed to put it on git. But I created another dummy project which leads to the same errors (not a smart application scenario, but just to replicate the issue):
https://github.com/ElaHobby/debugRW

Vivado version used: 2024.2
Prj: Debug folder
Other Folders: IPs used within Debug prj

How I run it:

  • I use the Vivado tcl shell
  • open_project debug
  • cd Debug.srcs\sources_1\bd\design_1
  • open_bd_design design_1.bd
  • source ${::env(RAPIDWRIGHT_PATH)}/tcl/rapidwright.tcl
  • rapid_compile_ipi

But slowly, it smells like Christmas & I don't want to bother you with such issues, so whenever you have the time after your Holidays, I'd appreciate your help!

@clavin-xlnx
Copy link
Member

Thank you for the example. Unfortunately, after spending a considerable amount of time, I was not able to update the flow to reproduce the issues you were seeing. There have been several changes over the many Vivado releases since we originally built this flow and there are a number of issues that need to be addressed. With time, I think we could get the flow to work again, but its utility would be limited to a very niche subset of use cases. Most of the problem is trying to synchronize the Vivado and RapidWright interfacing steps.

Are you able to provide all the inputs to the BlockStitcher for this example? Perhaps I can debug just the Java portion?

@ElaHobby
Copy link
Contributor Author

ElaHobby commented Jan 3, 2025

Tools version
When you say you were unable to reproduce the issue, does this mean your flow works top-down or that you don't get to the stitching step? Which Vivado version, RW commit and data&jars release version are you using? In my case:

  • Vivado 2024.2,
  • for data & jars, RapidWright 2024.2.0-beta Release,
  • RW Repo commit: f803bdc

How to reproduce it easier
I uploaded on my demo "debugRW" repo (link above) also my IP Cache. With it, you should be able to get directly to the stitching step when running RW. I did some manual fixes for some steps to get there (dcp folders & pblock generation), but with my IP Cache you should not face them anymore and get directly to the error in the stitcher described in this Issue.

Java portion with issue:
I get the null pointer exception in line 538. But the fact that the netlist is cleared in 537 and required in 538 is a bit weird for me. However, the Design class is close source, that's why I couldn't debug it on my own and created this issue on git.
Code lines:

537: stitched.setNetlist(null);
538: ModuleInst mi = stitched.createModuleInst(modInstName, modImpls.get(implementationIndex));
539: stitched.setNetlist(tmp);

Error:

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "com.xilinx.rapidwright.edif.EDIFNetlist.getLibrary(String)" because the return value of "com.xilinx.rapidwright.design.Design.getNetlist()" is null
        at com.xilinx.rapidwright.design.Design.createModuleInst(Unknown Source)
        at com.xilinx.rapidwright.design.Design.createModuleInst(Unknown Source)
        at com.xilinx.rapidwright.ipi.BlockStitcher.main(BlockStitcher.java:538)

I suppose 537 & 539 should be deleted, but then I get another error, at the same java code line. It looks like flow does not support block designs with two instances of the same Vivado IP, where the IP parameters differ. But if deleting those 2 lines is the correct way to go, then at least EDIFNetlist is not close source, so I can debug it on my side from here.

Exception in thread "main" java.lang.RuntimeException: ERROR: Destination netlist already contains EDIFCell named 'blk_mem_gen_v8_4_9' in library 'work'
        at com.xilinx.rapidwright.edif.EDIFNetlist.copyCellAndSubCellsWorker(EDIFNetlist.java:665)
        at com.xilinx.rapidwright.edif.EDIFNetlist.copyCellAndSubCellsWorker(EDIFNetlist.java:656)
        at com.xilinx.rapidwright.edif.EDIFNetlist.copyCellAndSubCells(EDIFNetlist.java:626)
        at com.xilinx.rapidwright.design.Design.a(Unknown Source)
        at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1228)
        at com.xilinx.rapidwright.design.Design.addModule(Unknown Source)
        at com.xilinx.rapidwright.design.Design.createModuleInst(Unknown Source)
        at com.xilinx.rapidwright.design.Design.createModuleInst(Unknown Source)
        at com.xilinx.rapidwright.ipi.BlockStitcher.main(BlockStitcher.java:538)

@clavin-xlnx
Copy link
Member

I am using the same version of Vivado and RapidWright that you have mentioned. However, my environment might still be different. Could you try reproducing the issue in a separate environment from your own? For example, what are the exact steps needed from a clean checkout of RapidWright and a download of RW.zip and IPCache.zip?

git clone https://github.com/Xilinx/RapidWright.git
cd RapidWright
git checkout f803bdc8749c085c95ad939309119e04bd37152b
# Any other changes to the RapidWright code base?
./gradlew compileJava
export PATH=`pwd`/bin:$PATH
export CLASSPATH=`pwd`/bin:`pwd`/jars/*
mkdir debugRW
cd debugRW
wget https://github.com/ElaHobby/debugRW/raw/refs/heads/main/RW.zip
unzip RW.zip
wget https://github.com/ElaHobby/debugRW/raw/refs/heads/main/IPCache.zip
unzip IPCache.zip
cd ..
# What other commands, environment settings, etc?
java com.xilinx.rapidwright.ipi.BlockStitcher # <what are the parameters being used?>

Are you able to identify the parameters that are being passed to the BlockStitcher? This would avoid having to call Vivado or have it be part of the loop. You could put a put in the Tcl script to print them out just before the BlockStitcher is called, this would help significantly if we don't need Vivado to run the part of the flow we are attempting to debug.

@ElaHobby
Copy link
Contributor Author

ElaHobby commented Jan 3, 2025

For example, what are the exact steps needed from a clean checkout of RapidWright?
I cloned the RW Project as you did & use the commit you wrote (1). For data & jars (2), I simply downloaded them (RapidWright 2024.2.0-beta Release), copied them to the RW folder. I imported this RW project (1) together with the downloaded data & jars folders (2) into eclipse.

Any other changes to the RapidWright code base?
No. Original code. Just that for generating the IP Cache & getting to the stitching step, I did some manual changes in file structure & PBlock files. But those changes should not be needed on your side now, as I uploaded the IP Cache ready to be used in the stitching step

What other commands, environment settings, etc?
I suppose you already set RAPIDWRIGHT_PATH env variable. In the code above you did not set the IP_CACHE_PATH, which should point to the folder with unzipped IPCache.zip.
I wrote in a comment above the commands I am using to run the flow, I paste it here:

  • I use the Vivado tcl shell
  • cd to Debug folder where you unziped project
  • open_project debug
  • cd Debug.srcs\sources_1\bd\design_1
  • open_bd_design design_1.bd
  • source ${::env(RAPIDWRIGHT_PATH)}/tcl/rapidwright.tcl
  • rapid_compile_ipi

Are you able to identify the parameters that are being passed to the BlockStitcher?
java -Xss16M com.xilinx.rapidwright.ipi.BlockStitcher C:/Users/myuser/IP_CACHE C:/Users/myuser/Debug/design_1.edf C:/Users/myuser/Debug/design_1_ips.txt
,where C:/Users/myuser/Debug is the folder with the unzipped debug prj, IP_CACHE the folder with the unzipped cache folder.
I think with the uploaded zips you should be able to directly run the BlockStitcher command, without having to go through the entire rapid_compile_ipi flow.

Where does your flow stop?

If it still doesn't run with the info above, don't invest more time in it. I think removing the code lines 537 & 539 should do the trick and then I'll debug the EDIFNetlist class for the next error, because it's not close source as Design class

@jakobwenzel
Copy link
Contributor

I believe I faced the second exception some time ago and managed to work around it.

My local fork of RapidWright is quite far removed from upstream (I refactored BlockStitcher to use ModuleImplsInst instead of ModuleInst). I'm not yet in a position to try and upstream all these changes.

I think at some point upstream's EdifNetlist.copyLibraryAndSubCells changed to include some more checks. During stitching, the EDIF netlist has some inconsistencies according to these new checks, leading to the exception Ela ran into. Design.repopulateNetlistOfModuleInst can fix these inconsistencies, but it is only called after creating the ModuleInsts.

My changes should more or less boil down to changing this area:

stitcher.stitchDesign(stitched, constraints);
Set<String> uniqifiedNetlists = new HashSet<>();
for (Entry<ModuleInst,EDIFNetlist> e : miMap.entrySet()) {
//System.out.println(" MAPPINGS: " + e.getKey() + " " + e.getValue() + " " + stitcher.instNameToInst.get(e.getKey().getName()) );
if (uniqifiedNetlists.contains(e.getValue().getName())) continue;
uniqifiedNetlists.add(e.getValue().getName());
stitched.repopulateNetlistOfModuleInst(e.getKey(), e.getValue());
}

Move the call stitcher.stitchDesign(stitched, constraints); down below the for loop, so that the whole area reads:

 Set<String> uniqifiedNetlists = new HashSet<>(); 
 for (Entry<ModuleInst,EDIFNetlist> e : miMap.entrySet()) { 
     //System.out.println(" MAPPINGS: " + e.getKey() + " " + e.getValue() + " " + stitcher.instNameToInst.get(e.getKey().getName()) ); 
     if (uniqifiedNetlists.contains(e.getValue().getName())) continue; 
     uniqifiedNetlists.add(e.getValue().getName()); 
     stitched.repopulateNetlistOfModuleInst(e.getKey(), e.getValue()); 
 } 

 stitcher.stitchDesign(stitched, constraints); 

Hope this helps.

@clavin-xlnx
Copy link
Member

clavin-xlnx commented Jan 4, 2025

Hope this helps.

Thanks @jakobwenzel for your input. I attempted to make this change, unfortunately, I did not see any meaningful improvement in the output. I have a set of changes that appear to resolve the NPE and allow a DCP to be loaded in Vivado.

Here are the set of steps in order to run the test case with the fix (in Linux):

git clone https://github.com/Xilinx/RapidWright.git --branch block_stitcher
cd RapidWright
./gradlew compileJava
export PATH=`pwd`/bin:$PATH
export CLASSPATH=`pwd`/bin:`pwd`/jars/*
mkdir debugRW
cd debugRW
wget https://github.com/ElaHobby/debugRW/raw/refs/heads/main/RW.zip
unzip RW.zip
wget https://github.com/ElaHobby/debugRW/raw/refs/heads/main/IPCache.zip
unzip IPCache.zip
cd ..
java com.xilinx.rapidwright.ipi.BlockStitcher debugRW debugRW/Debug/design_1.edf debugRW/Debug/design_1_ips.txt 
vivado -source debugRW/Debug/design_1_placed_load.tcl 

Note that the design contains encrypted cells and thus we need to use a Tcl script to correctly reload the design into Vivado (unfortunately, the BlockStitcher hides this message from the user).

image

@ElaHobby
Copy link
Contributor Author

ElaHobby commented Jan 4, 2025

Thanks, @jakobwenzel and @clavin-xlnx !

Yuhuuu, this fixes the issue also on my side. Thanks a lot!! It's a great Christmas present, 'cause I wanted to use the flow with pre-implemented blocks also for my next project.

From my side we can close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants