-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: fix the bug that uninitialized tensor cannot be found #1933
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)
void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
// won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
- if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+ if (b->owningNode() &&
+ (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
return;
original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)
void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
// won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
- if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+ if (b->owningNode() &&
+ (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
return;
original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)
void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
// won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
- if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+ if (b->owningNode() &&
+ (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
return;
original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
@@ -15,7 +15,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info) | |||
} | |||
|
|||
void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) { | |||
if (b->owningNode() && b->owningNode()->kind() == torch::jit::prim::Loop) | |||
// won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested | |||
if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there ways that b->owningNode()->owningBlock()->owningNode()
could cause a segfault? For example, I am wondering about the case where the whole graph is a single block, then would b->owningNode()->owningBlock()
be a nullptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't.
In that case, b->owningNode()
is null
and it won't check the rest part of that if statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the changes look good and this seems to resolve the main issue (pending #1823 verification). I just had some robustness questions about the nested prim::If
check (mentioned below).
Also, I tested this PR out on #1756 and the first bug (RuntimeError: Expected Tensor but got Uninitialized
) no longer appears, and it now shows:
RuntimeError: [Error thrown at core/partitioning/shape_analysis.cpp:183] Expected ivalues_maps.count(input) to be true but got false
Could not find torch::jit::Value* x.14 produced from %x.14 : Tensor = aten::embedding(%self.embedding.weight, %x.1, %padding_idx.42, %self.transformer_encoder.layers.0.training.21, %self.transformer_encoder.layers.0.training.21) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:2225:11 in lowering graph for mini graph input.
The above error is similar to #1815, #1834 - both models which I believe have nested prim::If
substructure.
Yes, these models fails because of complicated embedded if blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the replies and comments - this makes sense, and I think the bugfix looks good then. Could #1823 or similar be adapted into a test case for regression test use here?
Sure, let me add that as a test case later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index d708f44..ae208be 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)
void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
// won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
- if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+ if (b->owningNode() &&
+ (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
return;
original_blocks.push_back(b);
ERROR: Some files do not conform to style guidelines
444617b
to
0615d2d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some changes that do not conform to C++ style guidelines:
diff --git a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp b/tmp/changes.txt
index 559f266..6f10c51 100644
--- a/home/runner/work/TensorRT/TensorRT/core/partitioning/partitioningctx/PartitioningCtx.cpp
+++ b/tmp/changes.txt
@@ -16,7 +16,8 @@ PartitioningCtx::PartitioningCtx(torch::jit::Block* b, PartitioningInfo info)
void PartitioningCtx::_load_nodes_into_decision_map(torch::jit::Block* b) {
// won't load nodes if these nodes are in prim::loop or if these nodes are 2-level nested
- if (b->owningNode() && (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
+ if (b->owningNode() &&
+ (b->owningNode()->kind() == torch::jit::prim::Loop || b->owningNode()->owningBlock()->owningNode()))
return;
original_blocks.push_back(b);
diff --git a/home/runner/work/TensorRT/TensorRT/tests/core/partitioning/test_segmentation.cpp b/tmp/changes.txt
index 39f7e6a..95dfd77 100644
--- a/home/runner/work/TensorRT/TensorRT/tests/core/partitioning/test_segmentation.cpp
+++ b/tmp/changes.txt
@@ -338,14 +338,13 @@ TEST(Partitioning, ContainUninitializedValueCorrectly) {
auto ivalue_1 = g->insertConstant(torch::jit::IValue(1));
auto ivalue_2 = g->insertConstant(torch::jit::IValue(2));
- auto uninitialized_node = g->createUninitialized(torch::jit::BoolType::get());
+ auto uninitialized_node = g->createUninitialized(torch::jit::BoolType::get());
g->appendNode(uninitialized_node);
auto x_dim = g->create(torch::jit::aten::dim, {x}, 1);
g->appendNode(x_dim);
x_dim->output()->setType(torch::jit::IntType::get());
-
auto eq1 = g->create(torch::jit::aten::eq, {ivalue_1, x_dim->output()}, 1);
g->appendNode(eq1);
eq1->output()->setType(torch::jit::BoolType::get());
@@ -354,10 +353,10 @@ TEST(Partitioning, ContainUninitializedValueCorrectly) {
auto exception_val = g->insertConstant(except);
auto if_node = g->create(torch::jit::prim::If, {eq1->output()}, 1);
auto if_block_0 = if_node->addBlock();
- auto exception_node = g->create(torch::jit::prim::RaiseException, {exception_val, none_const_val}, 0);
+ auto exception_node = g->create(torch::jit::prim::RaiseException, {exception_val, none_const_val}, 0);
if_block_0->appendNode(exception_node);
if_block_0->registerOutput(uninitialized_node->output());
-
+
auto if_block_1 = if_node->addBlock();
if_block_1->registerOutput(eq1->output());
ERROR: Some files do not conform to style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to C++ style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
59db92d
to
04bd16e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to C++ style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
04bd16e
to
b31cacf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to C++ style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a small comment on a conditional. Once that is addressed, it looks good.
core/partitioning/partitioning.cpp
Outdated
@@ -118,7 +118,7 @@ void setNonTensorConnectedNodes(PartitioningCtx* ctx, std::vector<torch::jit::No | |||
if (!isTensor(output)) { | |||
for (auto use : output->uses()) { | |||
auto node = use.user; | |||
if (node->kind() != torch::jit::prim::Constant && ctx->shouldNodeRunInTensorRT(node)) { | |||
if (isConstantOrUninitialized(node) && ctx->shouldNodeRunInTensorRT(node)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be !isConstantOrUninitialized
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated that just now.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to C++ style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to C++ style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - just needs a rebase to get the test cases updated, then should be good to go.
This PR fixes the bug that uninitialized tensor cannot be found. Specifically, uninitialized value is copied between subgraphs previously, this introduces unnecessary complexity and might incur bugs. In this PR, this uninitialized value is handled like constants: Create one uninitialized value for each subgraph in need.
6dd54c0
to
4f6ddbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to C++ style guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code conforms to Python style guidelines
This PR fixes the bug that uninitialized tensor cannot be found. Specifically, uninitialized value is copied between subgraphs previously, this introduces unnecessary complexity and might incur bugs. In this PR, this uninitialized value is handled like constants: Create one uninitialized value for each subgraph in need.
Fixes # (issue)
Type of change