You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the python side exports pytorch models to disk in a pytorch-specific format via torch.jit.trace(), and loads them on the c++ side via the torch::jit::load(). I have many gripes with this setup:
torch.jit.trace() has a memory leak, forcing me to use a clunky workaround.
On the C++ side, there is also a memory leak in torchlib, forcing me to use an even more clunky workaround (restart the self-play process every hour). This workaround may cause issues if/when we start tackling games that last a long time (such as 2048).
The torchlib installation is a pain, demanding you to download a version that matches your machine CUDA version. This reduces our docker-setup flexibility. Furthermore, baking the installation step into the Dockerfile empirically leads to unacceptably slow docker image load times when running on runpod.io, which motivates this clunky workaround.
There are some dynamic-library load issues due to a clash between the python pytorch package and the c++ torchlib library. The result of this is that we are unable to utilize a debug-build of the c++ FFI library, which limits our debugging options. There may be some workaround, but I have not been able to figure it out.
Due to these issues, I would like to retire our dependence on the torchlib library. We can do this by having pytorch export the models in the interoperable open ONNX format. On the c++ side, we can use Microsoft's open-source onnxruntime library. Besides holding the promise of addressing the above issues, this could bring additional benefits:
A. onnxruntime will likely be faster than torchlib (although this needs to be tested).
B. There are many tools out there to inspect/visualize ONNX model files, such as Netron.
The text was updated successfully, but these errors were encountered:
Currently, the python side exports pytorch models to disk in a pytorch-specific format via
torch.jit.trace()
, and loads them on the c++ side via thetorch::jit::load()
. I have many gripes with this setup:torch.jit.trace()
has a memory leak, forcing me to use a clunky workaround.Due to these issues, I would like to retire our dependence on the torchlib library. We can do this by having pytorch export the models in the interoperable open ONNX format. On the c++ side, we can use Microsoft's open-source onnxruntime library. Besides holding the promise of addressing the above issues, this could bring additional benefits:
A. onnxruntime will likely be faster than torchlib (although this needs to be tested).
B. There are many tools out there to inspect/visualize ONNX model files, such as Netron.
The text was updated successfully, but these errors were encountered: