Fix: fix the bug that uninitialized tensor cannot be found #1933

bowang007 · 2023-05-19T01:50:05Z

This PR fixes the bug that uninitialized tensor cannot be found. Specifically, uninitialized value is copied between subgraphs previously, this introduces unnecessary complexity and might incur bugs. In this PR, this uninitialized value is handled like constants: Create one uninitialized value for each subgraph in need.

Fixes # (issue)

Type of change

Bug fix (non-breaking change which fixes an issue)

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
-  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+  if (b->owningNode() &&
+      (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
    return;

  original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines

github-actions

Code conforms to Python style guidelines

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
-  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+  if (b->owningNode() &&
+      (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
    return;

  original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines

github-actions

Code conforms to Python style guidelines

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
-  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+  if (b->owningNode() &&
+      (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
    return;

  original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines

github-actions

Code conforms to Python style guidelines

gs-olive · 2023-05-23T22:39:02Z

core/partitioning/partitioningctx/PartitioningCtx.cpp

@@ -15,7 +15,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)
 }

 void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
-  if (b->owningNode() && b->owningNode()->kind() == torch::jit::prim::Loop)
+  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
+  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))


Are there ways that b->owningNode()->owningBlock()->owningNode() could cause a segfault? For example, I am wondering about the case where the whole graph is a single block, then would b->owningNode()->owningBlock() be a nullptr?

It won't.
In that case, b->owningNode() is null and it won't check the rest part of that if statement.

gs-olive

Overall, the changes look good and this seems to resolve the main issue (pending #1823 verification). I just had some robustness questions about the nested prim::If check (mentioned below).

Also, I tested this PR out on #1756 and the first bug (RuntimeError: Expected Tensor but got Uninitialized) no longer appears, and it now shows:

RuntimeError: [Error thrown at core/partitioning/shape_analysis.cpp:183] Expected ivalues_maps.count(input) to be true but got false
Could not find torch::jit::Value* x.14 produced from %x.14 : Tensor = aten::embedding(%self.embedding.weight, %x.1, %padding_idx.42, %self.transformer_encoder.layers.0.training.21, %self.transformer_encoder.layers.0.training.21) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:2225:11 in lowering graph for mini graph input.

The above error is similar to #1815, #1834 - both models which I believe have nested prim::If substructure.

bowang007 · 2023-05-23T23:16:01Z

Overall, the changes look good and this seems to resolve the main issue (pending #1823 verification). I just had some robustness questions about the nested prim::If check (mentioned below).

Also, I tested this PR out on #1756 and the first bug (RuntimeError: Expected Tensor but got Uninitialized) no longer appears, and it now shows:
RuntimeError: [Error thrown at core/partitioning/shape_analysis.cpp:183] Expected ivalues_maps.count(input) to be true but got false
Could not find torch::jit::Value* x.14 produced from %x.14 : Tensor = aten::embedding(%self.embedding.weight, %x.1, %padding_idx.42, %self.transformer_encoder.layers.0.training.21, %self.transformer_encoder.layers.0.training.21) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:2225:11 in lowering graph for mini graph input.
The above error is similar to #1815, #1834 - both models which I believe have nested prim::If substructure.

Yes, these models fails because of complicated embedded if blocks.
We definitely need to optimize this handling for that.

gs-olive

Thanks for the replies and comments - this makes sense, and I think the bugfix looks good then. Could #1823 or similar be adapted into a test case for regression test use here?

bowang007 · 2023-05-24T01:10:39Z

Thanks for the replies and comments - this makes sense, and I think the bugfix looks good then. Could #1823 or similar be adapted into a test case for regression test use here?

Sure, let me add that as a test case later.

github-actions

Code conforms to Python style guidelines

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
-  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+  if (b->owningNode() &&
+      (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
    return;

  original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index 559f266..6f10c51 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
-  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+  if (b->owningNode() &&
+      (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
    return;

  original_blocks.push_back(b);
diff --git a/home/runner/work/TensorRT/TensorRT/tests/core/partitioning/test_segmentation.cpp b/tmp/changes.txt
index 39f7e6a..95dfd77 100644
--- a/home/runner/work/TensorRT/TensorRT/tests/core/partitioning/test_segmentation.cpp
+++ b/tmp/changes.txt
@@ -338,14 +338,13 @@ TEST(Partitioning, ContainUninitializedValueCorrectly) {
  auto ivalue_1 = g->insertConstant(torch::jit::IValue(1));
  auto ivalue_2 = g->insertConstant(torch::jit::IValue(2));

-    auto uninitialized_node = g->createUninitialized(torch::jit::BoolType::get());
+  auto uninitialized_node = g->createUninitialized(torch::jit::BoolType::get());
  g->appendNode(uninitialized_node);

  auto x_dim = g->create(torch::jit::aten::dim, {x}, 1);
  g->appendNode(x_dim);
  x_dim->output()->setType(torch::jit::IntType::get());

-
  auto eq1 = g->create(torch::jit::aten::eq, {ivalue_1, x_dim->output()}, 1);
  g->appendNode(eq1);
  eq1->output()->setType(torch::jit::BoolType::get());
@@ -354,10 +353,10 @@ TEST(Partitioning, ContainUninitializedValueCorrectly) {
  auto exception_val = g->insertConstant(except);
  auto if_node = g->create(torch::jit::prim::If, {eq1->output()}, 1);
  auto if_block_0 = if_node->addBlock();
-    auto exception_node = g->create(torch::jit::prim::RaiseException, {exception_val, none_const_val}, 0);
+  auto exception_node = g->create(torch::jit::prim::RaiseException, {exception_val, none_const_val}, 0);
  if_block_0->appendNode(exception_node);
  if_block_0->registerOutput(uninitialized_node->output());
-  
+
  auto if_block_1 = if_node->addBlock();
  if_block_1->registerOutput(eq1->output());

ERROR: Some files do not conform to style guidelines

github-actions

Code conforms to Python style guidelines

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

gs-olive

Made a small comment on a conditional. Once that is addressed, it looks good.

gs-olive · 2023-09-29T19:37:16Z

core/partitioning/partitioning.cpp

@@ -118,7 +118,7 @@ void setNonTensorConnectedNodes(PartitioningCtx* ctx, std::vector<torch::jit::No
      if (!isTensor(output)) {
        for (auto use : output->uses()) {
          auto node = use.user;
-          if (node->kind() != torch::jit::prim::Constant && ctx->shouldNodeRunInTensorRT(node)) {
+          if (isConstantOrUninitialized(node) && ctx->shouldNodeRunInTensorRT(node)) {


Should this be !isConstantOrUninitialized?

Yes, updated that just now.
Thanks!

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

gs-olive

Looks good to me - just needs a rebase to get the test cases updated, then should be good to go.

This PR fixes the bug that uninitialized tensor cannot be found. Specifically, uninitialized value is copied between subgraphs previously, this introduces unnecessary complexity and might incur bugs. In this PR, this uninitialized value is handled like constants: Create one uninitialized value for each subgraph in need.

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

facebook-github-bot added the cla signed label May 19, 2023

github-actions bot added component: core Issues re: The core compiler component: partitioning labels May 19, 2023

github-actions bot requested changes May 19, 2023

View reviewed changes

github-actions bot approved these changes May 19, 2023

View reviewed changes

bowang007 mentioned this pull request May 19, 2023

🐛 [Bug] Cannot convert simple torchscript containing two torch.nn.Upsample operations #1823

Closed

narendasan linked an issue May 22, 2023 that may be closed by this pull request

🐛 [Bug] Cannot convert simple torchscript containing two torch.nn.Upsample operations #1823

Closed

bowang007 requested review from narendasan and gs-olive May 23, 2023 21:52

github-actions bot requested changes May 23, 2023

View reviewed changes

github-actions bot approved these changes May 23, 2023

View reviewed changes

github-actions bot requested changes May 23, 2023

View reviewed changes

github-actions bot approved these changes May 23, 2023

View reviewed changes

gs-olive reviewed May 23, 2023

View reviewed changes

gs-olive reviewed May 24, 2023

View reviewed changes

bowang007 mentioned this pull request May 30, 2023

No type to cast for uninitialized type #1875

Closed

gs-olive self-requested a review May 30, 2023 23:15

github-actions bot approved these changes May 30, 2023

View reviewed changes

github-actions bot requested changes May 30, 2023

View reviewed changes

bowang007 force-pushed the fix_find_uninitialized_value branch from 444617b to 0615d2d Compare July 10, 2023 18:30

github-actions bot added the component: tests Issues re: Tests label Jul 10, 2023

github-actions bot requested changes Jul 10, 2023

View reviewed changes

github-actions bot approved these changes Jul 10, 2023

View reviewed changes

bowang007 mentioned this pull request Jul 10, 2023

fix: Replace EliminateExceptions lowering pass #1859

Merged

gs-olive mentioned this pull request Jul 12, 2023

🐛 [Bug] isBool() INTERNAL ASSERT FAILED while compiling Sequence Generator #1697

Closed

bowang007 mentioned this pull request Sep 8, 2023

fix 2 level nested conditional missing value bug #1991

Closed

bowang007 force-pushed the fix_find_uninitialized_value branch from 59db92d to 04bd16e Compare September 8, 2023 18:02

github-actions bot approved these changes Sep 8, 2023

View reviewed changes

bowang007 force-pushed the fix_find_uninitialized_value branch from 04bd16e to b31cacf Compare September 29, 2023 17:45

github-actions bot approved these changes Sep 29, 2023

View reviewed changes

gs-olive reviewed Sep 29, 2023

View reviewed changes

bowang007 requested a review from gs-olive October 7, 2023 05:36

github-actions bot approved these changes Oct 7, 2023

View reviewed changes

gs-olive approved these changes Jan 5, 2024

View reviewed changes

bowang007 force-pushed the fix_find_uninitialized_value branch from 6dd54c0 to 4f6ddbe Compare January 10, 2024 05:27

github-actions bot approved these changes Jan 10, 2024

View reviewed changes

narendasan closed this Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: fix the bug that uninitialized tensor cannot be found #1933

Fix: fix the bug that uninitialized tensor cannot be found #1933

bowang007 commented May 19, 2023

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

gs-olive May 23, 2023

bowang007 May 23, 2023

gs-olive left a comment

bowang007 commented May 23, 2023

gs-olive left a comment

bowang007 commented May 24, 2023

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

gs-olive left a comment

gs-olive Sep 29, 2023

bowang007 Oct 7, 2023

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

gs-olive left a comment

github-actions bot left a comment

github-actions bot left a comment

Fix: fix the bug that uninitialized tensor cannot be found #1933

Fix: fix the bug that uninitialized tensor cannot be found #1933

Conversation

bowang007 commented May 19, 2023

Type of change

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

gs-olive May 23, 2023

Choose a reason for hiding this comment

bowang007 May 23, 2023

Choose a reason for hiding this comment

gs-olive left a comment

Choose a reason for hiding this comment

bowang007 commented May 23, 2023

gs-olive left a comment

Choose a reason for hiding this comment

bowang007 commented May 24, 2023

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

gs-olive left a comment

Choose a reason for hiding this comment

gs-olive Sep 29, 2023

Choose a reason for hiding this comment

bowang007 Oct 7, 2023

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

gs-olive left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment