Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

odtk process get killed when trying to export model on xavier nx #315

Open
tahasadr opened this issue Oct 2, 2021 · 0 comments
Open

odtk process get killed when trying to export model on xavier nx #315

tahasadr opened this issue Oct 2, 2021 · 0 comments

Comments

@tahasadr
Copy link

tahasadr commented Oct 2, 2021

Hi there
following my issue, I tried to export my model on xavier nx
but faced several problem

  • No Precompiled Nvidia DALI
    Solved via crosscompiling as described in this
  • no tk module,
$ apt install python3-tk
  • Installed torch doesn`t support nccl
    Solved via building pytorch from source as stated here with these modification
export USE_NCCL=1
export USE_DISTRIBUTED=1
export USE_CUDA=1
export USE_CUDNN=1
export USE_NUMPY=1
export USE_MKLDNN=1
export USE_NNPACK=1
export USE_QNNPACK=1
export USE_OPENCV=1
export USE_PYTORCH_QNNPACK=1
export TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2"
export PYTORCH_BUILD_VERSION=1.9.0
export PYTORCH_BUILD_NUMBER=1

My config:

  • Jetpack 4.5.1
  • Docker Image: nvcr.io/nvidia/l4t-ml:r32.5.0-py3
  • Pytorch 1.9 built from source
  • Dali 1.7
  • Cuda 10.2

now when i try to export my model process get killed.

$ odtk export fine_tune_from_rn50fpn.pth engine.plan
NOTE! Installing ujson may make loading annotations faster.
Loading model from fine_tune_from_rn50fpn.pth...
     model: RetinaNet
  backbone: ResNet50FPN
   classes: 6, anchors: 9
Exporting to ONNX...
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ../c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
/usr/local/lib/python3.6/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
Building FP16 core model...
Building accelerated plugins...
Applying optimizations and building TRT CUDA engine...
Killed

any idea?
@YashNV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant