#17215: Add write/read APIs for TTNN tensors allocated on mesh buffer #17513

omilyutin-tt · 2025-02-03T19:42:08Z

Ticket

#17215

Problem description

Tensors allocated on mesh buffer (aka "mesh tensors") need write and read APIs exposed to TTNN.

What's changed

Extended mesh CQ interface to read / write shards, to accommodate TTNN multi-device sharding APIs.
- The future work includes parallelizing the per-device dispatches internally, within Metal.
Add to_device_mesh_tensor and to_host_mesh_tensor that will be the main API used in TTNN to read/write the mesh buffer tensors.

Checklist

Post commit CI passes
T3K unit tests
New/Existing tests provide coverage for changes

ayerofieiev-tt · 2025-02-03T20:02:32Z

ttnn/cpp/ttnn/tensor/storage.hpp

@@ -378,6 +378,9 @@ struct MultiDeviceStorage {

 using Storage = std::variant<OwnedStorage, DeviceStorage, BorrowedStorage, MultiDeviceHostStorage, MultiDeviceStorage>;

+template <typename T>
+concept OwnedOrBorrowedStorage = std::is_same_v<T, OwnedStorage> || std::is_same_v<T, BorrowedStorage>;


consider HostStorage

There is MultiDeviceHostStorage unfortunately. I think we need to do a better job at creating a hierarchy here - e.g. HostTensor as a collection of buffers (which can be owned or borrowed) + DeviceTensor that will always be backed by MeshBuffer (eventually). "Owned" vs" Borrowed" to me sounds like lower level concept, implementation detail of the buffer, not the whole tensor storage variant.

ayerofieiev-tt · 2025-02-03T20:03:26Z

ttnn/cpp/ttnn/tensor/tensor_impl.hpp

    CommandQueue& cq, std::shared_ptr<Buffer> device_buffer, void* host_buffer_data, bool blocking) {
    EnqueueReadBuffer(cq, device_buffer, host_buffer_data, blocking);
 }

 template <typename T>
-inline void read_data_from_device_buffer(std::shared_ptr<Buffer> device_buffer, std::vector<T>& host_buffer) {


ayerofieiev-tt · 2025-02-03T20:03:58Z

ttnn/cpp/ttnn/tensor/tensor_impl.hpp

    CommandQueue& cq, std::shared_ptr<Buffer> device_buffer, void* host_buffer_data, bool blocking) {
    EnqueueReadBuffer(cq, device_buffer, host_buffer_data, blocking);
 }

 template <typename T>
-inline void read_data_from_device_buffer(std::shared_ptr<Buffer> device_buffer, std::vector<T>& host_buffer) {
+void read_data_from_device_buffer(std::shared_ptr<Buffer> device_buffer, std::vector<T>& host_buffer) {
    ::tt::tt_metal::detail::ReadFromBuffer(device_buffer, host_buffer);


whats the difference between this vs EnqueueReadBuffer above?

is this a slow dispatch path?
if so, can you please mark the method deprecated with the comment that its a slow dispatch path?

Slow dispatch is not deprecated. It's actively used for all sorts of bringup, debug and experiments. Marking it deprecated would imply that this needs to be cleaned up from the codebase at some point, which is not the case.

TT-NN must not care about it I think

+1, TTNN needs to rely on the single API, internally we might fallback to slow dispatch if needed.

ayerofieiev-tt · 2025-02-03T20:04:56Z

ttnn/cpp/ttnn/tensor/tensor_impl.cpp

-            }
-        },
+            },
+            [](const auto& s) -> owned_buffer::Buffer<T> {


good change

ttnn/cpp/ttnn/tensor/tensor_impl.cpp

tt_metal/distributed/mesh_command_queue.cpp

tt-asaigal · 2025-02-03T20:18:10Z

ttnn/cpp/ttnn/tensor/tensor_impl.hpp

    CommandQueue& cq, std::shared_ptr<Buffer> device_buffer, void* host_buffer_data, bool blocking) {
    EnqueueReadBuffer(cq, device_buffer, host_buffer_data, blocking);
 }

 template <typename T>
-inline void read_data_from_device_buffer(std::shared_ptr<Buffer> device_buffer, std::vector<T>& host_buffer) {
+void read_data_from_device_buffer(std::shared_ptr<Buffer> device_buffer, std::vector<T>& host_buffer) {
    ::tt::tt_metal::detail::ReadFromBuffer(device_buffer, host_buffer);


Slow dispatch is not deprecated. It's actively used for all sorts of bringup, debug and experiments. Marking it deprecated would imply that this needs to be cleaned up from the codebase at some point, which is not the case.

tt-asaigal

Will you be adding python APis for this next? Would be great to see how TTNN code remain essentially unchanged when we switch backends.

tests/ttnn/unit_tests/gtests/tensor/test_mesh_tensor.cpp

omilyutin-tt · 2025-02-03T20:44:30Z

Will you be adding python APis for this next? Would be great to see how TTNN code remain essentially unchanged when we switch backends.

Most likely will add a switch to use these mesh-based implementations in the existing .cpu() / to() methods. Bunch of things will fail - I will be incrementally fixing them as we integrate mesh* primitives into TTNN. Eventually this will become the new and only code path.

tt-asaigal · 2025-02-03T20:58:17Z

Will you be adding python APis for this next? Would be great to see how TTNN code remain essentially unchanged when we switch backends.

Most likely will add a switch to use these mesh-based implementations in the existing .cpu() / to() methods. Bunch of things will fail - I will be incrementally fixing them as we integrate mesh* primitives into TTNN. Eventually this will become the new and only code path.

Yes, a first step would be to add a switch to those top level APIs, and then see what falls out when we integrate into the functions exposed by core.py. Thanks!

ttnn/cpp/ttnn/distributed/api.hpp

tt_metal/distributed/mesh_command_queue.cpp

ttnn/cpp/ttnn/tensor/tensor_impl.cpp

cfjchu

Overall looks good but need to review your changes and the assumptions you made about 1:1 mapping of shards to devices.

ttnn/cpp/ttnn/tensor/tensor_impl.cpp

cfjchu

thanks for fixing!

tt-asaigal · 2025-02-07T00:03:44Z

Latest changes look great, thanks Oleg!

omilyutin-tt requested review from cfjchu, ayerofieiev-tt, dmakoviichuk-tt, rfurko-tt, TT-BrianLiu, razorback3, dongjin-na, bbradelTT, aliuTT, tt-asaigal, abhullar-tt, pgkeller, tt-aho, tt-dma and ubcheema as code owners February 3, 2025 19:42

ayerofieiev-tt reviewed Feb 3, 2025

View reviewed changes

ttnn/cpp/ttnn/tensor/tensor_impl.cpp

}

},

},

[](const auto& s) -> owned_buffer::Buffer<T> {

Copy link

Member

ayerofieiev-tt Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good change

tt-asaigal reviewed Feb 3, 2025

View reviewed changes

tests/ttnn/unit_tests/gtests/tensor/test_mesh_tensor.cpp Outdated Show resolved Hide resolved

tests/ttnn/unit_tests/gtests/tensor/test_mesh_tensor.cpp Show resolved Hide resolved

tt-asaigal approved these changes Feb 3, 2025

View reviewed changes

bbradelTT approved these changes Feb 3, 2025

View reviewed changes

cfjchu reviewed Feb 4, 2025

View reviewed changes

ttnn/cpp/ttnn/distributed/api.hpp Show resolved Hide resolved

cfjchu reviewed Feb 4, 2025

View reviewed changes

tt_metal/distributed/mesh_command_queue.cpp Show resolved Hide resolved

cfjchu reviewed Feb 4, 2025

View reviewed changes

ttnn/cpp/ttnn/tensor/tensor_impl.cpp Outdated Show resolved Hide resolved

cfjchu requested changes Feb 4, 2025

View reviewed changes

ttnn/cpp/ttnn/tensor/tensor_impl.cpp Outdated Show resolved Hide resolved

omilyutin-tt force-pushed the omilyutin/mesh-tensor-rw branch from fd642fe to ce98cd9 Compare February 5, 2025 04:45

omilyutin-tt added 6 commits February 5, 2025 21:02

Add enqueue_{read,write}_shards methods to mesh cq.

b6328f5

Read path for mesh tensor

78892fd

Write path for mesh tensor

565c5ed

Review feedback, better ctor for mesh tensors

cbd083f

Allow mesh tensors to have uneven shard shapes

7eac4be

Extens read/write shards API to support buffer regions

65105aa

omilyutin-tt force-pushed the omilyutin/mesh-tensor-rw branch from ce98cd9 to 65105aa Compare February 5, 2025 22:15

cfjchu approved these changes Feb 7, 2025

View reviewed changes

omilyutin-tt merged commit a4deded into main Feb 7, 2025
234 checks passed

omilyutin-tt deleted the omilyutin/mesh-tensor-rw branch February 7, 2025 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#17215: Add write/read APIs for TTNN tensors allocated on mesh buffer #17513

#17215: Add write/read APIs for TTNN tensors allocated on mesh buffer #17513

omilyutin-tt commented Feb 3, 2025 •

edited

Loading

ayerofieiev-tt Feb 3, 2025

omilyutin-tt Feb 3, 2025

ayerofieiev-tt Feb 3, 2025

ayerofieiev-tt Feb 3, 2025

ayerofieiev-tt Feb 3, 2025 •

edited

Loading

tt-asaigal Feb 3, 2025

ayerofieiev-tt Feb 3, 2025

omilyutin-tt Feb 4, 2025

ayerofieiev-tt Feb 3, 2025

tt-asaigal Feb 3, 2025

tt-asaigal left a comment •

edited

Loading

omilyutin-tt commented Feb 3, 2025

tt-asaigal commented Feb 3, 2025

cfjchu left a comment

cfjchu left a comment

tt-asaigal commented Feb 7, 2025

#17215: Add write/read APIs for TTNN tensors allocated on mesh buffer #17513

#17215: Add write/read APIs for TTNN tensors allocated on mesh buffer #17513

Conversation

omilyutin-tt commented Feb 3, 2025 • edited Loading

Ticket

Problem description

What's changed

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayerofieiev-tt Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tt-asaigal left a comment • edited Loading

Choose a reason for hiding this comment

omilyutin-tt commented Feb 3, 2025

tt-asaigal commented Feb 3, 2025

cfjchu left a comment

Choose a reason for hiding this comment

cfjchu left a comment

Choose a reason for hiding this comment

tt-asaigal commented Feb 7, 2025

omilyutin-tt commented Feb 3, 2025 •

edited

Loading

ayerofieiev-tt Feb 3, 2025 •

edited

Loading

tt-asaigal left a comment •

edited

Loading