Does different proc()’s allow shared hardware resources? #1906

Mars-Cat2023 · 2025-01-23T13:40:17Z

Mars-Cat2023
Jan 23, 2025

What's hard to do? (limit 100 words)

In DSLX, let’s imagine that my DSLX code is hardware-bounded and only allowed to create about 4x4 PEs (or called Node in the example matmul_4x4.x). If I spawn 4x4 Nodes at 3 different locations by the same or different proc() in series (no concurrency, no data hazards), after compiling it into Verilog,
(1) How many the hardware do I use? Is it 4x4 PEs (because it is allowed to the ideal Node proc()? Or it must be 3×(4x4) PEs?

(2) How can we share the subproc() across different proc()?
(Hope that you can give me some solutions or equivalent solutions for this) [Hint: This is a question is because we are not allowed to treat proc() as a parameter type, are we? Essentially different from other valid ones like array, structure, primitive data types, etc..]

(3) How to index or reference the spawned proc() from another proc()?

Current best alternative workaround (limit 100 words)

/

Your view of the "best case XLS enhancement" (limit 100 words)

Is there any way to support sharing Nodes?

Alternatively, I have seen some other languages have something like this: option of memory

...
let shared = spawn SharedHardware();
let p = spawn Process(memory=shared.memory);
...

ericastor · 2025-01-27T15:39:40Z

ericastor
Jan 27, 2025
Maintainer

Currently, procs are always spawned with as much hardware as they require - because otherwise, we'd end up adding area & reducing throughput in order to provide guards to ensure that different uses don't conflict with each other. As of today, if you want to share units, you'd need to write the sharing logic yourself! If you want three 4x4 matmuls in your code, but you want to do it all with the hardware to support only one of these - you have a few options, including:

The serial approach. You can write & spawn a single matmul proc, send your matrices to multiply to it, and wait for the response. To share the matmul proc, you'll need to wrap the multiplier in a simple adapter that receives a set of matrices to multiply on one of N input channels, feeds that to the multiplier & receives the results while tracking which input channel the request came from, then forwards that to the matching output channel.
The self-contained parallel-dispatch arbiter. To handle the parallelism enabled by the systolic array, in reality you'd want to write a slightly more complex arbiter that can dispatch new multiplications while old multiplications are still running, and forward each set of outputs from the array to the appropriate requestor. Tracking which output goes with which request, while multiple requests are in flight, will require some more complex logic in the arbiter to understand & leverage the ordering guarantees on the outputs from the matmul proc.
Request tags. Rather than taking just the matrix values as inputs, your matmul proc could take an additional "request number" input, and return outputs with the associated "request number" attached. Your arbiter then becomes much simpler - you just attach a value that means "requested by user N" to each input, then forward each output result to the user it's addressed to - at the cost of extra logic inside your matmul to forward the request tag through as needed. Think of this as the Ethernet-style "routed" solution; you're implementing an address-based communications protocol.

Think of it this way - XLS is trying to provide you direct control over the logic you're writing, while not requiring you to manually organize everything into a finite state machine capable of ticking at your target clock speed. The slogan version of this is that DSLX is a "mid-level synthesis" language, not high-level synthesis; the goal is to give you superpowers & control, not to take meaningful decisions away from you.

There might be an enhancement request here for some features that can allow you to opt-in various pieces for automated sharing - and in fact, we're considering some of those! However, those are more for sharing things that can be defined by functions. If it involves a proc, I think the appropriate enhancement request would be for stronger "macro-style" features that can automatically generate the simple arbiter discussed above in option (1)... but I'm not sure how feasible it would be to generate the more-complex options that can leverage more of the systolic array's parallelism, since getting optimal results generally depends on either changing or understanding the logic of the proc being wrapped.

EDIT: Just to clarify - I'm just one contributor to this project, and the above is my personal understanding. An "official" response on behalf of the project as a whole would be a different matter!

0 replies

ericastor · 2025-01-27T15:47:45Z

ericastor
Jan 27, 2025
Maintainer

A slightly different angle on the question that just occurred to me: it's difficult to guarantee "no concurrency, no data hazards" in a proc without doing it very purposefully. By default, procs are pipelined - this means that activation N of a proc may still be running when activation N+1 starts receiving messages. You can fix this by passing a token through state, blocking the current activation's I/O on some I/O operation from the previous activation... but we currently won't take advantage of that to "share logic".

If you genuinely want to do this using cross-activation tokens to ensure your matrix multiplications are not in conflict - and all of your matrix multiplications are coming from a single orchestrating proc - then there's an easy solution. Write your matmul proc, and spawn one copy of it from your main proc. Send each multiplication to your matmul proc, using tokens to make sure your inputs are sent (and outputs received) in the order you intend... and don't forget to thread the token through the main proc's state, so activation N+1 won't start communicating with your matmul proc out of order!

Since this is all going over a single set of input & output channels, you won't need the arbiter I mentioned above. That's only needed if you need to communicate with a single matmul proc from multiple potential users.

0 replies

Mars-Cat2023 · 2025-01-28T20:51:56Z

Mars-Cat2023
Jan 28, 2025
Author

Thanks for your great feedback.
Now I learn that the XLS compiler usually uses (or prefers to use) different hardware for different proc()'s. And I agree with your great thoughts about how to share the resources manually by serialization, arbiter or additional tagging methods.

Actually, I want to ask one more thing for clarification. For clarity, I would like to use a concrete example:
If I have some wrapper process A, B that has this structure A(B(C)), where means that process A will spawn exactly 1 process B, and within process B it will spawn exactly 1 process C, where C will be the base case and do the real calculation. Everything works up to this moment.
But if I have another process D, that will have this structure D(B(C))), and I know that I guarantee that D will be called after process A is done. Or rather B and C will only have one master process control them at any moment. I cannot create a big process that has A, D, or D2,D3, D4... because I don't know how many processes will share or who will finally share this (B(C)) process.

What I thought is, is there any way that I can refer to B process again? You can see if it is not proc(), but struct or other primitive datatypes, we can do it very simple by passing the pointer of B into the new function or process of D.

Alternatively, is there any way to refer to the channels I created in process B from outside? I mean like create some global or super-local scope that I can let D know the channels of B and reuse it this way. Otherwise, it makes D unknown how to find the entry or the channels of B.

0 replies

ericastor · 2025-01-29T15:12:58Z

ericastor
Jan 29, 2025
Maintainer

At this point, I think this would be a great topic for a Discussion thread, more than an Issue! It might eventually produce an enhancement proposal - or there might already be a corresponding one.

However, one thing to keep in mind is that XLS channels are (currently) point-to-point. If a proc B exposes channels that A actively connects to (i.e., sends & receives messages on), then no other proc D can connect to those same channels on B! How to handle this when you want to share resources can depend a lot on the details of your specific problem, as mentioned in my last comment.

So: can you be a bit more concrete with your example? e.g., is there a reason you're specifically spawning B from A, rather than spawning B from the same "coordinator proc" that's also spawning A & D, then connecting A & D to its interface?

If you can do the latter, then you are instead back in the same situation I mentioned in my earlier comments; since channels are point-to-point, you'll either need B to expose two sets of channels (one for each "caller") and handle the arbitration internally, or you'll need to wrap B with one of the styles of interface-arbitration proc I described above in my #1876 (comment) above.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does different proc()’s allow shared hardware resources? #1906

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Does different proc()’s allow shared hardware resources? #1906

Mars-Cat2023 Jan 23, 2025

What's hard to do? (limit 100 words)

Current best alternative workaround (limit 100 words)

Your view of the "best case XLS enhancement" (limit 100 words)

Replies: 4 comments

ericastor Jan 27, 2025 Maintainer

ericastor Jan 27, 2025 Maintainer

Mars-Cat2023 Jan 28, 2025 Author

ericastor Jan 29, 2025 Maintainer

Mars-Cat2023
Jan 23, 2025

ericastor
Jan 27, 2025
Maintainer

ericastor
Jan 27, 2025
Maintainer

Mars-Cat2023
Jan 28, 2025
Author

ericastor
Jan 29, 2025
Maintainer