Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: fix the bug that uninitialized tensor cannot be found #1933

Closed
wants to merge 1 commit into from

Conversation

bowang007
Copy link
Collaborator

This PR fixes the bug that uninitialized tensor cannot be found. Specifically, uninitialized value is copied between subgraphs previously, this introduces unnecessary complexity and might incur bugs. In this PR, this uninitialized value is handled like constants: Create one uninitialized value for each subgraph in need.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
-  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+  if (b->owningNode() &&
+      (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
    return;

  original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
-  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+  if (b->owningNode() &&
+      (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
    return;

  original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
-  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+  if (b->owningNode() &&
+      (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
    return;

  original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

@@ -15,7 +15,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)
}

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
if (b->owningNode() && b->owningNode()->kind() == torch::jit::prim::Loop)
// won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there ways that b->owningNode()->owningBlock()->owningNode() could cause a segfault? For example, I am wondering about the case where the whole graph is a single block, then would b->owningNode()->owningBlock() be a nullptr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't.
In that case, b->owningNode() is null and it won't check the rest part of that if statement.

Copy link
Collaborator

@gs-olive gs-olive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the changes look good and this seems to resolve the main issue (pending #1823 verification). I just had some robustness questions about the nested prim::If check (mentioned below).

Also, I tested this PR out on #1756 and the first bug (RuntimeError: Expected Tensor but got Uninitialized) no longer appears, and it now shows:

RuntimeError: [Error thrown at core/partitioning/shape_analysis.cpp:183] Expected ivalues_maps.count(input) to be true but got false
Could not find torch::jit::Value* x.14 produced from %x.14 : Tensor = aten::embedding(%self.embedding.weight, %x.1, %padding_idx.42, %self.transformer_encoder.layers.0.training.21, %self.transformer_encoder.layers.0.training.21) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:2225:11 in lowering graph for mini graph input.

The above error is similar to #1815, #1834 - both models which I believe have nested prim::If substructure.

@bowang007
Copy link
Collaborator Author

Overall, the changes look good and this seems to resolve the main issue (pending #1823 verification). I just had some robustness questions about the nested prim::If check (mentioned below).

Also, I tested this PR out on #1756 and the first bug (RuntimeError: Expected Tensor but got Uninitialized) no longer appears, and it now shows:

RuntimeError: [Error thrown at core/partitioning/shape_analysis.cpp:183] Expected ivalues_maps.count(input) to be true but got false
Could not find torch::jit::Value* x.14 produced from %x.14 : Tensor = aten::embedding(%self.embedding.weight, %x.1, %padding_idx.42, %self.transformer_encoder.layers.0.training.21, %self.transformer_encoder.layers.0.training.21) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:2225:11 in lowering graph for mini graph input.

The above error is similar to #1815, #1834 - both models which I believe have nested prim::If substructure.

Overall, the changes look good and this seems to resolve the main issue (pending #1823 verification). I just had some robustness questions about the nested prim::If check (mentioned below).

Also, I tested this PR out on #1756 and the first bug (RuntimeError: Expected Tensor but got Uninitialized) no longer appears, and it now shows:

RuntimeError: [Error thrown at core/partitioning/shape_analysis.cpp:183] Expected ivalues_maps.count(input) to be true but got false
Could not find torch::jit::Value* x.14 produced from %x.14 : Tensor = aten::embedding(%self.embedding.weight, %x.1, %padding_idx.42, %self.transformer_encoder.layers.0.training.21, %self.transformer_encoder.layers.0.training.21) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:2225:11 in lowering graph for mini graph input.

The above error is similar to #1815, #1834 - both models which I believe have nested prim::If substructure.

Yes, these models fails because of complicated embedded if blocks.
We definitely need to optimize this handling for that.

Copy link
Collaborator

@gs-olive gs-olive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the replies and comments - this makes sense, and I think the bugfix looks good then. Could #1823 or similar be adapted into a test case for regression test use here?

@bowang007
Copy link
Collaborator Author

Thanks for the replies and comments - this makes sense, and I think the bugfix looks good then. Could #1823 or similar be adapted into a test case for regression test use here?

Sure, let me add that as a test case later.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
-  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+  if (b->owningNode() &&
+      (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
    return;

  original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines

@bowang007 bowang007 force-pushed the fix_find_uninitialized_value branch from 444617b to 0615d2d Compare July 10, 2023 18:30
@github-actions github-actions bot added the component: tests Issues re: Tests label Jul 10, 2023
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index 559f266..6f10c51 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)

void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
  // won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
-  if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+  if (b->owningNode() &&
+      (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
    return;

  original_blocks.push_back(b);
diff --git a/home/runner/work/TensorRT/TensorRT/tests/core/partitioning/test_segmentation.cpp b/tmp/changes.txt
index 39f7e6a..95dfd77 100644
--- a/home/runner/work/TensorRT/TensorRT/tests/core/partitioning/test_segmentation.cpp
+++ b/tmp/changes.txt
@@ -338,14 +338,13 @@ TEST(Partitioning, ContainUninitializedValueCorrectly) {
  auto ivalue_1 = g->insertConstant(torch::jit::IValue(1));
  auto ivalue_2 = g->insertConstant(torch::jit::IValue(2));

-    auto uninitialized_node = g->createUninitialized(torch::jit::BoolType::get());
+  auto uninitialized_node = g->createUninitialized(torch::jit::BoolType::get());
  g->appendNode(uninitialized_node);

  auto x_dim = g->create(torch::jit::aten::dim, {x}, 1);
  g->appendNode(x_dim);
  x_dim->output()->setType(torch::jit::IntType::get());

-
  auto eq1 = g->create(torch::jit::aten::eq, {ivalue_1, x_dim->output()}, 1);
  g->appendNode(eq1);
  eq1->output()->setType(torch::jit::BoolType::get());
@@ -354,10 +353,10 @@ TEST(Partitioning, ContainUninitializedValueCorrectly) {
  auto exception_val = g->insertConstant(except);
  auto if_node = g->create(torch::jit::prim::If, {eq1->output()}, 1);
  auto if_block_0 = if_node->addBlock();
-    auto exception_node = g->create(torch::jit::prim::RaiseException, {exception_val, none_const_val}, 0);
+  auto exception_node = g->create(torch::jit::prim::RaiseException, {exception_val, none_const_val}, 0);
  if_block_0->appendNode(exception_node);
  if_block_0->registerOutput(uninitialized_node->output());
-  
+
  auto if_block_1 = if_node->addBlock();
  if_block_1->registerOutput(eq1->output());

ERROR: Some files do not conform to style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

@bowang007 bowang007 force-pushed the fix_find_uninitialized_value branch from 04bd16e to b31cacf Compare September 29, 2023 17:45
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link
Collaborator

@gs-olive gs-olive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a small comment on a conditional. Once that is addressed, it looks good.

@@ -118,7 +118,7 @@ void setNonTensorConnectedNodes(PartitioningCtx* ctx, std::vector<torch::jit::No
if (!isTensor(output)) {
for (auto use : output->uses()) {
auto node = use.user;
if (node->kind() != torch::jit::prim::Constant && ctx->shouldNodeRunInTensorRT(node)) {
if (isConstantOrUninitialized(node) && ctx->shouldNodeRunInTensorRT(node)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be !isConstantOrUninitialized?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated that just now.
Thanks!

@bowang007 bowang007 requested a review from gs-olive October 7, 2023 05:36
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Copy link
Collaborator

@gs-olive gs-olive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - just needs a rebase to get the test cases updated, then should be good to go.

This PR fixes the bug that uninitialized tensor cannot be found.
Specifically, uninitialized value is copied between subgraphs
previously, this introduces unnecessary complexity and might incur
bugs. In this PR, this uninitialized value is handled like constants:
Create one uninitialized value for each subgraph in need.
@bowang007 bowang007 force-pushed the fix_find_uninitialized_value branch from 6dd54c0 to 4f6ddbe Compare January 10, 2024 05:27
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

@narendasan narendasan closed this Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 [Bug] Cannot convert simple torchscript containing two torch.nn.Upsample operations
4 participants