Does different proc()’s allow shared hardware resources? #1906
Replies: 4 comments
-
Currently, procs are always spawned with as much hardware as they require - because otherwise, we'd end up adding area & reducing throughput in order to provide guards to ensure that different uses don't conflict with each other. As of today, if you want to share units, you'd need to write the sharing logic yourself! If you want three 4x4 matmuls in your code, but you want to do it all with the hardware to support only one of these - you have a few options, including:
Think of it this way - XLS is trying to provide you direct control over the logic you're writing, while not requiring you to manually organize everything into a finite state machine capable of ticking at your target clock speed. The slogan version of this is that DSLX is a "mid-level synthesis" language, not high-level synthesis; the goal is to give you superpowers & control, not to take meaningful decisions away from you. There might be an enhancement request here for some features that can allow you to opt-in various pieces for automated sharing - and in fact, we're considering some of those! However, those are more for sharing things that can be defined by functions. If it involves a proc, I think the appropriate enhancement request would be for stronger "macro-style" features that can automatically generate the simple arbiter discussed above in option (1)... but I'm not sure how feasible it would be to generate the more-complex options that can leverage more of the systolic array's parallelism, since getting optimal results generally depends on either changing or understanding the logic of the proc being wrapped. EDIT: Just to clarify - I'm just one contributor to this project, and the above is my personal understanding. An "official" response on behalf of the project as a whole would be a different matter! |
Beta Was this translation helpful? Give feedback.
-
A slightly different angle on the question that just occurred to me: it's difficult to guarantee "no concurrency, no data hazards" in a proc without doing it very purposefully. By default, procs are pipelined - this means that activation N of a proc may still be running when activation N+1 starts receiving messages. You can fix this by passing a token through state, blocking the current activation's I/O on some I/O operation from the previous activation... but we currently won't take advantage of that to "share logic". If you genuinely want to do this using cross-activation tokens to ensure your matrix multiplications are not in conflict - and all of your matrix multiplications are coming from a single orchestrating proc - then there's an easy solution. Write your matmul proc, and spawn one copy of it from your main proc. Send each multiplication to your matmul proc, using tokens to make sure your inputs are sent (and outputs received) in the order you intend... and don't forget to thread the token through the main proc's state, so activation N+1 won't start communicating with your matmul proc out of order! Since this is all going over a single set of input & output channels, you won't need the arbiter I mentioned above. That's only needed if you need to communicate with a single matmul proc from multiple potential users. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your great feedback. Actually, I want to ask one more thing for clarification. For clarity, I would like to use a concrete example: What I thought is, is there any way that I can refer to B process again? You can see if it is not proc(), but struct or other primitive datatypes, we can do it very simple by passing the pointer of B into the new function or process of D. Alternatively, is there any way to refer to the channels I created in process B from outside? I mean like create some global or super-local scope that I can let D know the channels of B and reuse it this way. Otherwise, it makes D unknown how to find the entry or the channels of B. |
Beta Was this translation helpful? Give feedback.
-
At this point, I think this would be a great topic for a Discussion thread, more than an Issue! It might eventually produce an enhancement proposal - or there might already be a corresponding one. However, one thing to keep in mind is that XLS channels are (currently) point-to-point. If a proc B exposes channels that A actively connects to (i.e., sends & receives messages on), then no other proc D can connect to those same channels on B! How to handle this when you want to share resources can depend a lot on the details of your specific problem, as mentioned in my last comment. So: can you be a bit more concrete with your example? e.g., is there a reason you're specifically spawning B from A, rather than spawning B from the same "coordinator proc" that's also spawning A & D, then connecting A & D to its interface? If you can do the latter, then you are instead back in the same situation I mentioned in my earlier comments; since channels are point-to-point, you'll either need B to expose two sets of channels (one for each "caller") and handle the arbitration internally, or you'll need to wrap B with one of the styles of interface-arbitration proc I described above in my #1876 (comment) above. |
Beta Was this translation helpful? Give feedback.
-
What's hard to do? (limit 100 words)
In DSLX, let’s imagine that my DSLX code is hardware-bounded and only allowed to create about 4x4 PEs (or called Node in the example matmul_4x4.x). If I spawn 4x4 Nodes at 3 different locations by the same or different proc() in series (no concurrency, no data hazards), after compiling it into Verilog,
(1) How many the hardware do I use? Is it 4x4 PEs (because it is allowed to the ideal Node proc()? Or it must be 3×(4x4) PEs?
(2) How can we share the subproc() across different proc()?
(Hope that you can give me some solutions or equivalent solutions for this) [Hint: This is a question is because we are not allowed to treat proc() as a parameter type, are we? Essentially different from other valid ones like array, structure, primitive data types, etc..]
(3) How to index or reference the spawned proc() from another proc()?
Current best alternative workaround (limit 100 words)
/
Your view of the "best case XLS enhancement" (limit 100 words)
Is there any way to support sharing Nodes?
Alternatively, I have seen some other languages have something like this:
option of memory
Beta Was this translation helpful? Give feedback.
All reactions